|
Home > Archive > SQL Anywhere database > August 2005 > Hash of set of rows?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Hash of set of rows?
|
|
| Edgard L. Riba 2005-08-23, 1:23 pm |
| Hi,
I need to compare the row contents between two tables located at remote
sites.
The tables are identical and should contain the same data. Part of the
requirement is to make sure that the data in both tables are indeed the
same. I was wondering if there was a way to generate via a stored
procedure on the remote site some sort of hash value of the contents of a
group of records.
The record structures is as follows:
CREATE TABLE iTransRow (
idLoc TINYINT NOT NULL,
idSeq INTEGER NOT NULL,
idLine INTEGER NOT NULL,
Item INTEGER NULL,
ItemExt VARCHAR(14) NULL,
PieceCount DECIMAL(7,2) NULL,
Pack DECIMAL(8,3) NULL,
InvoiceQty DECIMAL(11,5) NULL,
Qty DECIMAL(11,5) NULL,
Cost DECIMAL(8,5) NULL,
SuggestedPrice DECIMAL(8,5) NULL,
Price DECIMAL(8,5) NULL,
ListPrice DECIMAL(8,5) NULL,
Tax DECIMAL(9,4) NULL,
TaxPercent DECIMAL(9,4) NULL,
TaxGroup TINYINT NULL,
Discounts DECIMAL(9,4) NULL,
Expenses DECIMAL(9,4) NULL,
OtherTaxes DECIMAL(9,4) NULL,
Freight DECIMAL(9,4) NULL,
TransactionDate DATETIME NULL,
Notes TINYINT NULL,
Status TINYINT NULL,
PRIMARY KEY (idLoc,idSeq,idLine)
);
SELECT * FROM iTransRow WHERE idLoc=1 and idSeq=10;
Returns a set of rows. Is there a way to do something like:
SELECT MD5(SELECT * FROM iTransRow WHERE idLoc=1 and idSeq=10) INTO xMD5;
It doesn't specifically have to be MD5, it can be any hash, even CRC32
should be ok.
The reason for doing it this way is that it should be much faster
calculating this with a stored procedure than bringing all the rows to the
local site and verifying row by row.
Best regards,
Edgard
| |
| Mark Culp 2005-08-23, 1:23 pm |
| Assuming you are using 9.0.x, what about using:
set delimC = '\t';
set delimR = '\n';
set res = hash(
select list( colA || delimC || colB || delimC || ... || colX,
delimR order by PK )
from T
where ... );
--
Mark Culp
SQLAnywhere Research and Development
-------------------------------------------------------------------------
** Whitepapers, TechDocs, bug fixes are all available through the **
** iAnywhere Developer Community at http://www.ianywhere.com/developer **
-------------------------------------------------------------------------
"Edgard L. Riba" wrote:
>
> Hi,
>
> I need to compare the row contents between two tables located at remote
> sites.
>
> The tables are identical and should contain the same data. Part of the
> requirement is to make sure that the data in both tables are indeed the
> same. I was wondering if there was a way to generate via a stored
> procedure on the remote site some sort of hash value of the contents of a
> group of records.
>
> The record structures is as follows:
> CREATE TABLE iTransRow (
> idLoc TINYINT NOT NULL,
> idSeq INTEGER NOT NULL,
> idLine INTEGER NOT NULL,
> Item INTEGER NULL,
> ItemExt VARCHAR(14) NULL,
> PieceCount DECIMAL(7,2) NULL,
> Pack DECIMAL(8,3) NULL,
> InvoiceQty DECIMAL(11,5) NULL,
> Qty DECIMAL(11,5) NULL,
> Cost DECIMAL(8,5) NULL,
> SuggestedPrice DECIMAL(8,5) NULL,
> Price DECIMAL(8,5) NULL,
> ListPrice DECIMAL(8,5) NULL,
> Tax DECIMAL(9,4) NULL,
> TaxPercent DECIMAL(9,4) NULL,
> TaxGroup TINYINT NULL,
> Discounts DECIMAL(9,4) NULL,
> Expenses DECIMAL(9,4) NULL,
> OtherTaxes DECIMAL(9,4) NULL,
> Freight DECIMAL(9,4) NULL,
> TransactionDate DATETIME NULL,
> Notes TINYINT NULL,
> Status TINYINT NULL,
> PRIMARY KEY (idLoc,idSeq,idLine)
> );
>
> SELECT * FROM iTransRow WHERE idLoc=1 and idSeq=10;
>
> Returns a set of rows. Is there a way to do something like:
>
> SELECT MD5(SELECT * FROM iTransRow WHERE idLoc=1 and idSeq=10) INTO xMD5;
>
> It doesn't specifically have to be MD5, it can be any hash, even CRC32
> should be ok.
>
> The reason for doing it this way is that it should be much faster
> calculating this with a stored procedure than bringing all the rows to the
> local site and verifying row by row.
>
> Best regards,
> Edgard
| |
| Edgard L. Riba 2005-08-23, 8:23 pm |
| Hi Mark,
This sounds exactly what I need!
A couple of questions.
1) Why do you add the delimiters?
> select list( colA || delimC || colB || delimC || ... || colX,
> delimR order by PK )
wouldn't the following work the same?
select list( colA || colB || ... || colX, ORDER by PK )
I tried it this way and it does return a hash. In fact I tried it without
even the ORDER BY clause, and tested it on both servers and it works;
however, I imagine that it could fail because there is no order guaranteed.
SELECT HASH(
(SELECT LIST(idLoc||idSeq||i
dLine||Item||ItemExt
||
InvoiceQty||Qty||Lis
tCost||Cost||
SuggestedPrice||Pric
e||ListPrice||Tax||
TaxPercent||TaxGroup
||
Discounts||Expenses|
|OtherTaxes||
Freight||Transaction
Date||Status||Pack||
Notes||Pieces)
FROM iTransRow
WHERE idLoc=1 AND idSeq=853028)
,'SHA1') AS SHA1
Returns the hash:
5679c0e444b2dd4346d8
019c49d85576fcd77965
On a transaction that has some 8500 records in about 20 seconds.
2) The 8500 row transaction takes me to the other question. I did the
follwing on the same transaction:
BEGIN
declare delimC CHAR(1);
declare delimR CHAR(1);
set delimC = '\t';
set delimR = '\n';
select list( idLoc || delimC || idSeq || delimC || idLine,
delimR order by idLine )
FROM iTransRow
WHERE idLoc=1 AND idSeq=853028;
END
I tried this in WInSQL and it only returns the first 400 or so rows. Is
there a limit to the length of the string that LIST() returns for the hash
calculation, or is this result just an artifact of using WinSQL? BTW, in
Interactive SQL the same query only results in 1 row returned...
Thanks in advance,
Edgard
| |
| Mark Culp 2005-08-24, 1:23 pm |
| You are trying to determine if the rows in one table are the
same as the rows in another... so you need to ensure that you
are comparing the same thing.
Now consider the case that you have two integer columns beside
each other. Example:
Table1 col1 col2
Row1 1 234
Table col1 col2
Row1 123 4
Without the delimiters between columns, you would generate
a hash of ("1234") for both tables, yet the tables do not
contain the same thing !
Putting delimiters between the columns changes the hash
to be "1\t234" vs "123\t4".
A similar argument can be made for adding delimiters between
the rows.
Finally, an order by clause is needed to ensure that the order
of the rows within the list() are the same - if not explicitly
ordered, you could end up with two identical tables hashing
to different values because the rows end up in a different order.
- Mark
"Edgard L. Riba" wrote:
>
> Hi Mark,
>
> This sounds exactly what I need!
>
> A couple of questions.
>
> 1) Why do you add the delimiters?
>
>
> wouldn't the following work the same?
>
> select list( colA || colB || ... || colX, ORDER by PK )
>
> I tried it this way and it does return a hash. In fact I tried it without
> even the ORDER BY clause, and tested it on both servers and it works;
> however, I imagine that it could fail because there is no order guaranteed.
>
> SELECT HASH(
> (SELECT LIST(idLoc||idSeq||i
dLine||Item||ItemExt
||
> InvoiceQty||Qty||Lis
tCost||Cost||
> SuggestedPrice||Pric
e||ListPrice||Tax||
> TaxPercent||TaxGroup
||
> Discounts||Expenses|
|OtherTaxes||
> Freight||Transaction
Date||Status||Pack||
Notes||Pieces)
> FROM iTransRow
> WHERE idLoc=1 AND idSeq=853028)
> ,'SHA1') AS SHA1
>
> Returns the hash:
> 5679c0e444b2dd4346d8
019c49d85576fcd77965
>
> On a transaction that has some 8500 records in about 20 seconds.
>
> 2) The 8500 row transaction takes me to the other question. I did the
> follwing on the same transaction:
>
> BEGIN
> declare delimC CHAR(1);
> declare delimR CHAR(1);
> set delimC = '\t';
> set delimR = '\n';
> select list( idLoc || delimC || idSeq || delimC || idLine,
> delimR order by idLine )
> FROM iTransRow
> WHERE idLoc=1 AND idSeq=853028;
> END
>
> I tried this in WInSQL and it only returns the first 400 or so rows. Is
> there a limit to the length of the string that LIST() returns for the hash
> calculation, or is this result just an artifact of using WinSQL? BTW, in
> Interactive SQL the same query only results in 1 row returned...
>
> Thanks in advance,
> Edgard
| |
| Edgard L. Riba 2005-08-24, 1:23 pm |
| Crystal clear!
Thanks a lot Mark.
Edgard
"Mark Culp" < reply_to_newsgroups_
only_please_nospam_m
ark.culp@iAnywhere.com>
escribió en el mensaje news:430CA1E2.28A4CCD3@iAnywhere.com...[color=darkred]
> You are trying to determine if the rows in one table are the
> same as the rows in another... so you need to ensure that you
> are comparing the same thing.
>
> Now consider the case that you have two integer columns beside
> each other. Example:
>
> Table1 col1 col2
> Row1 1 234
>
> Table col1 col2
> Row1 123 4
>
> Without the delimiters between columns, you would generate
> a hash of ("1234") for both tables, yet the tables do not
> contain the same thing !
> Putting delimiters between the columns changes the hash
> to be "1\t234" vs "123\t4".
>
> A similar argument can be made for adding delimiters between
> the rows.
>
> Finally, an order by clause is needed to ensure that the order
> of the rows within the list() are the same - if not explicitly
> ordered, you could end up with two identical tables hashing
> to different values because the rows end up in a different order.
>
> - Mark
>
> "Edgard L. Riba" wrote:
|
|
|
|
|