|
Home > Archive > Slony1 PostgreSQL Replication > February 2006 > duplicate key error, thread safety problem?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
duplicate key error, thread safety problem?
|
|
| Michael Crozier 2006-02-01, 3:25 am |
|
Hi,
I encountered some duplicate key errors in my slony cluster today. Clearly,
an event/log was replicated more than once.
I believe that this may be due to "the Solaris threading issue", but I can't
find enough clear information about this problem to determine whether I
failed to avoid it in the build of Postgresql and Slony.
Detais:
Solaris 9 sparc, 7.3.13, compiled with --thread-safety
Solaris 10 opteron, 8.0.6, compiled with --thread-safety
All the slon's were running from the 8.0.6 instance/build.
I was able to manually remove the offending rows and get the slon's processing
events again, but I'm worried about a few things:
1. How is my data? Do I need to re-sync?
2. How can I prove that this problem is related to threading issue?
3. What IS the threading issue? I can't find a good description of the
problem and the solution.
4. If the problem still exists on the 8.0.X build, how do I correct it?
Any input and assistance would be appreciated.
-Michael
| |
| Hannu Krosing 2006-02-01, 3:25 am |
| w5xoZWwga2VuYWwgcMOk
ZXZhbCwgVCwgMjAwNi0w
MS0zMSBrZWxsIDIwOjE0
LCBraXJqdXRhcyBN
aWNoYWVsIENyb3ppZXI6
Cj4gSGksCj4gCj4gSSBl
bmNvdW50ZXJlZCBzb21l
IGR1cGxpY2F0ZSBr
ZXkgZXJyb3JzIGluIG15
IHNsb255IGNsdXN0ZXIg
dG9kYXkuICBDbGVhcmx5
LCAKPiBhbiBldmVu
dC9sb2cgd2FzIHJlcGxp
Y2F0ZWQgbW9yZSB0aGFu
IG9uY2UuCj4gCj4gSSBi
ZWxpZXZlIHRoYXQg
dGhpcyBtYXkgYmUgZHVl
IHRvICJ0aGUgU29sYXJp
cyB0aHJlYWRpbmcgaXNz
dWUiLCBidXQgSSBj
YW4ndCAKPiBmaW5kIGVu
b3VnaCBjbGVhciBpbmZv
cm1hdGlvbiBhYm91dCB0
aGlzIHByb2JsZW0g
dG8gZGV0ZXJtaW5lIHdo
ZXRoZXIgSSAKPiBmYWls
ZWQgdG8gYXZvaWQgaXQg
aW4gdGhlIGJ1aWxk
IG9mIFBvc3RncmVzcWwg
YW5kIFNsb255Lgo+IAo+
IERldGFpczoKPiAgU29s
YXJpcyA5IHNwYXJj
LCA3LjMuMTMsIGNvbXBp
bGVkIHdpdGggLS10aHJl
YWQtc2FmZXR5IAo+ICBT
b2xhcmlzIDEwIG9w
dGVyb24sIDguMC42LCBj
b21waWxlZCB3aXRoIC0t
dGhyZWFkLXNhZmV0eQo+
ICBBbGwgdGhlIHNs
b24ncyB3ZXJlIHJ1bm5p
bmcgZnJvbSB0aGUgOC4w
LjYgaW5zdGFuY2UvYnVp
bGQuCj4KPiBJIHdh
cyBhYmxlIHRvIG1hbnVh
bGx5IHJlbW92ZSB0aGUg
b2ZmZW5kaW5nIHJvd3Mg
YW5kIGdldCB0aGUg
c2xvbidzIHByb2Nlc3Np
bmcgCj4gZXZlbnRzIGFn
YWluLCBidXQgSSdtIHdv
cnJpZWQgYWJvdXQg
YSBmZXcgdGhpbmdzOgo+
IAo+IDEuIEhvdyBpcyBt
eSBkYXRhPyAgRG8gSSBu
ZWVkIHRvIHJlLXN5
bmM/ CgpQb3NzaWJsZS4gQ2hl
Y2sgeW91ciBkYXRhIDop
Cgo+IDIuIEhvdyBjYW4g
SSBwcm92ZSB0
aGF0IHRoaXMgcHJvYmxl
bSBpcyByZWxhdGVkIHRv
IHRocmVhZGluZyBpc3N1
ZT8KCkkgZG9uJ3Qg
dGhpbmsgaXQgaXMgcmVs
YXRlZCB0byB0aHJlYWRp
bmcgaXNzdWUuIAoKSWYg
eW91IGhhdmUgaGFk
IG1vcmUgdGhhbiAyRyAo
X3h4eF9jbHVzdGVyXy5z
bF9sb2dfMS5sb2dfeGlk
ID4gMkcpCnRyYW5z
YWN0aW9ucyBleGVjdXRl
ZCBkdXJpbmcgdGhlIHJl
cGxpY2F0aW9uLCB3aXRo
b3V0IHJlaW5kZXhp
bmcKc2xfbG9nXzEsIHRo
ZW4gaW5kZXhlcyBvbiB4
eGlkIHN0YXJ0cyBtaXNi
ZWhhdmluZywgcmVz
dWx0aW5nIGJvdGggaW4K
ZHVwbGljYXRlIGtleSBl
cnJvcnMgKmFuZCogc29t
ZSBldmVudHMgbm90
IGJlaW5nIHJlcGxpY2F0
ZWQgKGkuZS4gZGF0YQps
b3NzKS4gCgpJdCBzaG91
bGQgYmUgKGJ1dCBp
cyBub3QpIGRvY3VtZW50
ZWQgaW4gQklHIEZSSUVO
RExZIExFVFRFUlMgb24g
dGl0bGUKcGFnZSBv
ZiBzbG9ueSBkb2NzIC4K
Cj4gMy4gV2hhdCBJUyB0
aGUgdGhyZWFkaW5nIGlz
c3VlPyAgSSBjYW4n
dCBmaW5kIGEgZ29vZCBk
ZXNjcmlwdGlvbiBvZiB0
aGUKPiAgICAgcHJvYmxl
bSBhbmQgdGhlIHNv
bHV0aW9uLgo+IDQuIElm
IHRoZSBwcm9ibGVtIHN0
aWxsIGV4aXN0cyBvbiB0
aGUgOC4wLlggYnVp
bGQsIGhvdyBkbyBJIGNv
cnJlY3QgaXQ/ CgpJJ3ZlIGhlYXJkIHRo
ZXJlIGFyZSBzb21lIHBs
YW5z
IHRvIHN0YXJ0IGFsdGVy
bmF0aW5nIGJldHdlZW4g
c2xfbG9nXzEKYW5kIHNs
X2xvZ18yIGFuZCB0
cnVuY2F0aW5nIHRoZSB1
bnVzZWQgdGFibGVzIGlu
IHVwY29taW5nIHYyLjAg
b2YKc2xvbnkuIAoK
VW50aWwgdGhlbiB0aGUg
b25seSBhbHRlcm5hdGl2
ZSBJIGtub3cgb2YgaXMg
ZG9pbmcgcmVpbmRl
eCBvbiBhbnkKaW5kZXhl
cyB1c2luZyB4eGlkX29w
cyBhdCBsZWFzdCBhZnRl
ciBldmVyeSAxRyB0
cmFuc2FjdGlvbnMuCgpB
bmQgTkVWRVIgdXNlIGEg
c2V0dXAgd2hlcmUgZGF0
YSBmcm9tIG11bHRp
cGxlIG1hc3RlcnMgZ29l
cyB0aHJvdWdoIHRoZQpz
YW1lIG5vZGUsIGFzIHRo
aXMgZ3JlYXRseSBp
bmNyZWFzZXMgYSBwb3Rl
bnRpYWwgb2YgYSBzaXR1
YXRpb24gd2hlcmUKdGhl
cmUgYXJlIHh4aWRz
IGFwYXJ0IGJ5IG1vcmUg
dGhhbiAyRyAoZHVlIHRv
IGRpZmZlcmVudCB0cngg
cmF0ZXMpIGluCndo
aWNoIGNhc2UgYnRyZWUg
aW5kZXhlcyBicmVhay4g
dGhpcyAyRyBkaWZmZXJl
bmNlIG11c3Qgbm90
IGJlIHByZXNlbnQKYXQg
dGhlIHNhbWUgdGltZSwg
aXQganVzdCBoYXMgdG8g
YmUgc28gZHVyaW5n
IHRoZSBsaWZldGltZSBv
ZiB0aGUgaW5kZXguCgpU
aGlzIGJlaGF2aW91ciBp
cyBlc3BlY2lhbGx5
IG5hc3R5LCBiZWNhdXNl
IGl0IGlzIG5vdCBkZXRl
Y3RlZCBpbgp0ZXN0aW5n
ICh1bmxlc3MgeW91
IGFyZSBhYmxlIHRvIHJ1
biB0ZXN0cyBmb3IgbW9y
ZSB0aGFuIDJHIHRhbnNh
Y3Rpb25zLAp3aGlj
aCB0YWtlcyAyMyBkYXlz
IGF0IDEwMDAgdHJ4L3Nl
YykgYW5kIGV2ZW4gd2hl
biBpdCBhY3RpdmF0
ZXMgYWZ0ZXIgMkcKdHJ4
IGl0IHN0YXJ0cyBlYXRp
bmcgeW91ciBkYXRhIHF1
aXRlIHNsb3dseSBh
bmQgdW5kZXRlY3RhYmx5
IGF0IGZpcnN0IC0KeW91
IHdvbid0IG5vdGljZSB0
aGUgZGF0YSBsb3Nz
LCBvbmx5IHNlZSBhbiBv
Y2Nhc2lvbmFsIGR1cGxp
Y2F0ZSBrZXkKZXJyb3Iu
CgpJZiB5b3Ugd2Fu
dCB0byBrbm93IGEgbGl0
dGxlIG1vcmUgYWJvdXQg
dGhlIGlzc3VlIGxvb2sg
Zm9yIG15IHJlY2Vu
dApwb3N0cyBvbiB0aGlz
IGxpc3QuCgotLS0tLS0t
LS0tLS0tLS0tCkhhbm51
CgoKX19fX19fX19f
X19fX19fX19fX19fX19f
X19fX19fX19fX19fX19f
X19fX19fX18KU2xvbnkx
LWdlbmVyYWwgbWFp
bGluZyBsaXN0ClNsb255
MS1nZW5lcmFsQGdib3Jn
LnBvc3RncmVzcWwub3Jn
Cmh0dHA6Ly9nYm9y
Zy5wb3N0Z3Jlc3FsLm9y
Zy9tYWlsbWFuL2xpc3Rp
bmZvL3Nsb255MS1nZW5l
cmFsCg==
| |
| Brad Nicholson 2006-02-01, 9:26 am |
| Michael Crozier wrote:
>Hi,
>
>I encountered some duplicate key errors in my slony cluster today. Clearly,
>an event/log was replicated more than once.
>
>I believe that this may be due to "the Solaris threading issue", but I can't
>find enough clear information about this problem to determine whether I
>failed to avoid it in the build of Postgresql and Slony.
>
>Detais:
> Solaris 9 sparc, 7.3.13, compiled with --thread-safety
>
>
Upgrade this version ASAP. The following bug, fixed in 7.4.8, reared
it's ugly head around here corrupting replica's with a duplicate key
violation errors and making several DBA's very cranky...
http://www.postgresql.org/docs/7.4/...ease-7-4-8.html
"Repair ancient race condition that allowed a transaction to be seen as
committed for some purposes (eg SELECT FOR UPDATE) slightly sooner than
for other purposes This is an extremely serious bug since it could lead
to apparent data inconsistencies being briefly visible to applications."
> Solaris 10 opteron, 8.0.6, compiled with --thread-safety
> All the slon's were running from the 8.0.6 instance/build.
>
>I was able to manually remove the offending rows and get the slon's processing
>events again, but I'm worried about a few things:
>
>1. How is my data? Do I need to re-sync?
>
>
Where did you remove the offending rows from? If you manually removed
the row from sl_log_1, leaving the data on the master, but not the
subscriber, then your replica is shot.
>2. How can I prove that this problem is related to threading issue?
>3. What IS the threading issue? I can't find a good description of the
> problem and the solution.
>4. If the problem still exists on the 8.0.X build, how do I correct it?
>
>
>
Since moving to 7.4.8, we haven't had the replica corrupting scenario.
We have seen the duplicate key violation error on occasion, but the
problem was with a bad index on the table being replicated to.
Reindexing that table on table on the subscriber fixed the problem.
--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.
| |
| Michael Crozier 2006-02-01, 1:25 pm |
|
>
> Upgrade this version ASAP. The following bug, fixed in 7.4.8, reared
> it's ugly head around here corrupting replica's with a duplicate key
> violation errors and making several DBA's very cranky...
>
> http://www.postgresql.org/docs/7.4/...ease-7-4-8.html
I believe that this issue was fixed in 7.3 around 7.3.10. 7.3.13 is the
latest in the 7.3 series. Unless I'm looking at the wrong bug/fix?
>
> Where did you remove the offending rows from? If you manually removed
> the row from sl_log_1, leaving the data on the master, but not the
> subscriber, then your replica is shot.
I removed the offending row in the slave table:
begin;
update pg_class set reltriggers=0 where relname = 'the_table';
delete from the_table where primary_key = 'the_rows_pk_val';
update pg_class set reltriggers=1 where relname = 'the_table';
commit;
After this, slon successfully processing the event group.
> Since moving to 7.4.8, we haven't had the replica corrupting scenario.
> We have seen the duplicate key violation error on occasion, but the
> problem was with a bad index on the table being replicated to.
> Reindexing that table on table on the subscriber fixed the problem.
I don't think that this is my issue, as I could see that the row was actually
present. Additionally, it appears the other changes from the same event
group had also been previously processed, but I can't be positively sure.
Thanks,
Michael
| |
| Michael Crozier 2006-02-01, 1:25 pm |
|
>
> Possible. Check your data :)
Only a few hundred million rows... I better get started :-)
>
> I don't think it is related to threading issue.
>
> If you have had more than 2G (_xxx_cluster_.sl_log_1.log_xid > 2G)
> transactions executed during the replication, without reindexing
> sl_log_1, then indexes on xxid starts misbehaving, resulting both in
> duplicate key errors *and* some events not being replicated (i.e. data
> loss).
This could be it. The problem has occurred three times, all after adding a
new table which took some time to COPY and create indexes, but there were no
pending events when the COPY started and it caught up quickly after the
addition/merge was complete.
I very much doubt we've done 2G transactions yet, as this is a new cluster
with only master->slave replication. I would estimated ~40 million
transactions.
> If you want to know a little more about the issue look for my recent
> posts on this list.
I will read this and continue to investigate.
Thanks,
Michael
| |
| Brad Nicholson 2006-02-01, 8:29 pm |
| Michael Crozier wrote:
>
>I believe that this issue was fixed in 7.3 around 7.3.10. 7.3.13 is the
>latest in the 7.3 series. Unless I'm looking at the wrong bug/fix?
>
>
Checked the release notes, looks like you're right, that fix was backported.
>
>
>
>I removed the offending row in the slave table:
> begin;
> update pg_class set reltriggers=0 where relname = 'the_table';
> delete from the_table where primary_key = 'the_rows_pk_val';
> update pg_class set reltriggers=1 where relname = 'the_table';
> commit;
>
>After this, slon successfully processing the event group.
>
>
Hmmm.... As much as I dislike rebuilding nodes, I certainly wouldn't
trust one after messing with it manually like that. I'd be rebuilding it.
>
>I don't think that this is my issue, as I could see that the row was actually
>present. Additionally, it appears the other changes from the same event
>group had also been previously processed, but I can't be positively sure.
>
>
Yup, that's a different case than what we saw. The row wasn't there,
but the index thought it was.
--
Brad Nicholson 416-673-4106 bnichols-swQf4SbcV9C7WVzo/KQ3Mw@public.gmane.org
Database Administrator, Afilias Canada Corp.
| |
| Michael Crozier 2006-02-01, 8:29 pm |
|
> Hmmm.... As much as I dislike rebuilding nodes, I certainly wouldn't
> trust one after messing with it manually like that. I'd be rebuilding it.
I know what you mean. Still, if I can explain the problem, I might be able to
reason that the effects were such that all updates were applied in order,
even if they were done twice in succession. In that case I wouldn't have to
start the week-long backup and vacuum-free rebuild....
| |
| Michael Crozier 2006-02-01, 8:29 pm |
|
I'm having another occurance of this problem. Additionally, I have SQL
logging turn on for all nodes in order to capture the cause. If anybody has
any suggestions to find the cause and report the bug/problem, I'm all
ears/eyes.
| |
| Brad Nicholson 2006-02-01, 8:29 pm |
| Michael Crozier wrote:
>I'm having another occurance of this problem. Additionally, I have SQL
>logging turn on for all nodes in order to capture the cause. If anybody has
>any suggestions to find the cause and report the bug/problem, I'm all
>ears/eyes.
>
>
>
Try reindexing sl_log_1 on the node that is providing the node that is
experiencing the duplicate key error (if it's the master, you are going
to block on the reindex)
--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.
| |
| Michael Crozier 2006-02-01, 8:29 pm |
|
I apologize, I mis-controlled less and was looking at part of the logfile from
yesterday evening. I had already manually corrected that error.
| |
| Michael Crozier 2006-02-01, 8:29 pm |
|
Combing through the logs, I have the SQL that slon was performing when it
encountered the duplicate primary key error. It shows that there sl_log_1
entries are being reprocessed, specifically those with log_actionseq 7865731
through 7865736.
I've removed the SQL for privacy/brevity.
Any ideas?
====================
====================
================
------------ can see data from the master.sl_log_1 being copied
2006-01-31 20:11:30 PST 17563 32021 LOG: statement:
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888029', '21', '7865730', 'U',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888034', '21', '7865731', 'U',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888039', '24', '7865732', 'I',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888041', '24', '7865733', 'U', < sl_log_1 data.... >
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888045', '21', '7865734', 'U',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888049', '21', '7865735', 'U',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888048', '153', '7865736', 'I',
------------ data from sl_log_1 being processed
2006-01-31 20:11:30 PST 17563 32022 LOG: duration: 0.604 ms
2006-01-31 20:11:30 PST 17563 32023 LOG: statement:
update only "public".
update only "public".
insert into "public". < huge sql statements from sl_log_1... >
update only "public".
update only "public".
update only "public".
insert into "public".
-------------- interim activity (nothing has been removed )
2006-01-31 20:11:30 PST 17563 32024 LOG: duration: 23.090 ms
2006-01-31 20:11:30 PST 17563 32025 LOG: statement: update "_replication".sl_setsync set ssy_seqno = '208362', ssy_minxid = '255887638', ssy_maxxid = '255888058', ssy_xip = '''255887852'',''255
888056'',''255888055
'',''255888054'',''2
55888052'','
'255888007'',''25588
8040'',''255887638''
', ssy_action_list = '' where ssy_setid in (1,175) and ssy_seqno < '208362';
2006-01-31 20:11:30 PST 17563 32026 LOG: duration: 0.570 ms
2006-01-31 20:11:30 PST 17563 32027 LOG: statement: insert into "_replication".sl_seqlog (seql_seqid, seql_origin, seql_ev_seqno, seql_last_value) values (0, 1, '208362', '1000000000000000');
2006-01-31 20:11:30 PST 17563 32028 LOG: duration: 0.126 ms
2006-01-31 20:11:30 PST 17563 32029 LOG: statement: notify "_replication_Event"; notify " _replication_Confirm
"; insert into "_replication".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type ) values ('1', '20
8362', '2006-01-31 20:11:30.401831', '255887638', '255888058', '''255887852'',''255
888056'',''255888055
'',''255888054'',''2
55888052'',''2558880
07'',''255888040'','
'255887638''', 'SYNC'); insert into "_replication".sl_confirm (con_origin, con_receive
d, con_seqno, con_timestamp) values (1, 2, '208362', now()); commit transaction;
2006-01-31 20:11:30 PST 17563 32030 LOG: duration: 11.882 ms
2006-01-31 20:11:30 PST 17563 32031 LOG: statement: begin transaction; set transaction isolation level serializable;
2006-01-31 20:11:30 PST 17563 32032 LOG: duration: 0.194 ms
2006-01-31 20:11:30 PST 17563 32033 LOG: statement: select SSY.ssy_setid, SSY.ssy_seqno, SSY.ssy_minxid, SSY.ssy_maxxid, SSY.ssy_xip, SSY.ssy_action_list from "_replication".sl_setsync SSY where SSY.ssy_seqno < '208363' and SSY.ssy_setid in
(1,175);
2006-01-31 20:11:30 PST 17563 32034 LOG: duration: 0.473 ms
2006-01-31 20:11:30 PST 17563 32035 LOG: statement: select T.tab_id, T.tab_set, "_replication".slon_quote_input('"' || PGN.nspname || '"."' || PGC.relname || '"') as tab_fqname from "_replication".sl_table T, "pg_catalog".pg_class PGC, "
pg_catalog".pg_namespace PGN where T.tab_set = 175 and PGC.oid = T.tab_reloid and PGC.relnamespace = PGN.oid;
2006-01-31 20:11:30 PST 17563 32036 LOG: duration: 0.666 ms
2006-01-31 20:11:30 PST 17563 32037 LOG: statement: select T.tab_id, T.tab_set, "_replication".slon_quote_input('"' || PGN.nspname || '"."' || PGC.relname || '"') as tab_fqname from "_replication".sl_table T, "pg_catalog".pg_class PGC, "
pg_catalog".pg_namespace PGN where T.tab_set = 1 and PGC.oid = T.tab_reloid and PGC.relnamespace = PGN.oid;
2006-01-31 20:11:30 PST 17563 32038 LOG: duration: 4.618 ms
2006-01-31 20:11:30 PST 17562 4725 LOG: statement: start transaction; set transaction isolation level serializable;
2006-01-31 20:11:30 PST 17562 4726 LOG: duration: 0.072 ms
2006-01-31 20:11:30 PST 17562 4727 LOG: statement: select ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type, ev_data1, ev_data2, ev_data3, ev_data4, ev_data5, ev_data6, ev_data7, ev_data8 from "_replication".sl_ev
ent where ev_origin = '2' and ev_seqno > '26565' order by ev_seqno
2006-01-31 20:11:30 PST 17601 2911 LOG: statement: select ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type, ev_data1, ev_data2, ev_data3, ev_data4, ev_data5, ev_data6, ev_data7, ev_data8
from "_replication".sl_event e where (e.ev_origin = '2' and e.ev_seqno > '26565') order by e.ev_origin, e.ev_seqno
2006-01-31 20:11:30 PST 17601 2912 LOG: duration: 0.384 ms
2006-01-31 20:11:30 PST 17601 2913 LOG: statement: select con_origin, con_received, max(con_seqno) as con_seqno, max(con_timestamp) as con_timestamp from "_replication".sl_confirm where con_received <> 1 group by con_origin, con_received
2006-01-31 20:11:30 PST 17562 4728 LOG: duration: 0.975 ms
2006-01-31 20:11:30 PST 17562 4729 LOG: statement: rollback transaction;
2006-01-31 20:11:30 PST 17562 4730 LOG: duration: 0.061 ms
2006-01-31 20:11:30 PST 17601 2914 LOG: duration: 1.417 ms
2006-01-31 20:11:31 PST 17563 32039 LOG: statement: update "_replication".sl_setsync set ssy_seqno = '208363', ssy_minxid = '255887638', ssy_maxxid = '255888041', ssy_xip = '''255887852'',''255
888034'',''255888038
'',''255888030'',''2
55888007'','
'255888039'',''25588
7638''', ssy_action_list = '' where ssy_setid in (1,175) and ssy_seqno < '208363';
2006-01-31 20:11:31 PST 17563 32040 LOG: duration: 0.429 ms
2006-01-31 20:11:31 PST 17563 32041 LOG: statement: insert into "_replication".sl_seqlog (seql_seqid, seql_origin, seql_ev_seqno, seql_last_value) values (0, 1, '208363', '1000000000000000');
2006-01-31 20:11:31 PST 17563 32042 LOG: duration: 0.093 ms
2006-01-31 20:11:31 PST 17563 32043 LOG: statement: notify "_replication_Event"; notify " _replication_Confirm
"; insert into "_replication".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type ) values ('1', '20
8363', '2006-01-31 20:11:30.319947', '255887638', '255888041', '''255887852'',''255
888034'',''255888038
'',''255888030'',''2
55888007'',''2558880
39'',''255887638''',
'SYNC'); insert into "_replication".sl_confirm (con_origin, con_received, con_seqno, co
n_timestamp) values (1, 2, '208363', now()); commit transaction;
2006-01-31 20:11:31 PST 17563 32044 LOG: duration: 1.067 ms
2006-01-31 20:11:31 PST 17563 32045 LOG: statement: begin transaction; set transaction isolation level serializable; lock table "_replication".sl_config_lock; select "_replication".mergeSet_int(1, 175); notify "_replication_Event"; notify "_replication_
Confirm"; insert into "_replication".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1, ev_data2 ) values ('1', '208364', '2006-01-31 20:11:30.319947', '255887638', '255888041', '''255887852'',''255
8
88034'',''255888038'
',''255888030'',''25
5888007'',''25588803
9'',''255887638''', 'MERGE_SET', '1', '175'); insert into "_replication".sl_confirm (con_origin, con_received, con_seqno, con_timestamp) values (1, 2, '208364', now()); commit transact
ion;
2006-01-31 20:11:31 PST 17562 4731 LOG: statement: start transaction; set transaction isolation level serializable;
2006-01-31 20:11:31 PST 17562 4732 LOG: duration: 0.094 ms
2006-01-31 20:11:31 PST 17562 4733 LOG: statement: select ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type, ev_data1, ev_data2, ev_data3, ev_data4, ev_data5, ev_data6, ev_data7, ev_data8 from "_replication".sl_ev
ent where ev_origin = '2' and ev_seqno > '26565' order by ev_seqno
2006-01-31 20:11:31 PST 17562 4734 LOG: duration: 1.034 ms
2006-01-31 20:11:31 PST 17562 4735 LOG: statement: rollback transaction;
2006-01-31 20:11:31 PST 17562 4736 LOG: duration: 0.087 ms
2006-01-31 20:11:31 PST 17563 32046 LOG: duration: 14.967 ms
2006-01-31 20:11:31 PST 17562 4737 LOG: statement: start transaction; set transaction isolation level serializable;
2006-01-31 20:11:31 PST 17562 4738 LOG: duration: 0.064 ms
2006-01-31 20:11:31 PST 17562 4739 LOG: statement: select ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type, ev_data1, ev_data2, ev_data3, ev_data4, ev_data5, ev_data6, ev_data7, ev_data8 from "_replication".sl_ev
ent where ev_origin = '2' and ev_seqno > '26565' order by ev_seqno
2006-01-31 20:11:31 PST 17601 2915 LOG: statement: select ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type, ev_data1, ev_data2, ev_data3, ev_data4, ev_data5, ev_data6, ev_data7, ev_data8
from "_replication".sl_event e where (e.ev_origin = '2' and e.ev_seqno > '26565') order by e.ev_origin, e.ev_seqno
2006-01-31 20:11:31 PST 17601 2916 LOG: duration: 0.386 ms
2006-01-31 20:11:31 PST 17601 2917 LOG: statement: select con_origin, con_received, max(con_seqno) as con_seqno, max(con_timestamp) as con_timestamp from "_replication".sl_confirm where con_received <> 1 group by con_origin, con_received
2006-01-31 20:11:31 PST 17562 4740 LOG: duration: 1.020 ms
2006-01-31 20:11:31 PST 17562 4741 LOG: statement: rollback transaction;
2006-01-31 20:11:31 PST 17562 4742 LOG: duration: 0.083 ms
2006-01-31 20:11:31 PST 17601 2918 LOG: duration: 1.270 ms
2006-01-31 20:11:31 PST 17563 32047 LOG: statement: begin transaction; set transaction isolation level serializable;
2006-01-31 20:11:31 PST 17563 32048 LOG: duration: 0.103 ms
2006-01-31 20:11:31 PST 17563 32049 LOG: statement: select SSY.ssy_setid, SSY.ssy_seqno, SSY.ssy_minxid, SSY.ssy_maxxid, SSY.ssy_xip, SSY.ssy_action_list from "_replication".sl_setsync SSY where SSY.ssy_seqno < '208365' and SSY.ssy_setid in
(1);
2006-01-31 20:11:31 PST 17563 32050 LOG: duration: 0.361 ms
2006-01-31 20:11:31 PST 17563 32051 LOG: statement: select T.tab_id, T.tab_set, "_replication".slon_quote_input('"' || PGN.nspname || '"."' || PGC.relname || '"') as tab_fqname from "_replication".sl_table T, "pg_catalog".pg_class PGC, "
pg_catalog".pg_namespace PGN where T.tab_set = 1 and PGC.oid = T.tab_reloid and PGC.relnamespace = PGN.oid;
2006-01-31 20:11:31 PST 17563 32052 LOG: duration: 4.538 ms
2006-01-31 20:11:31 PST 17564 4847 LOG: statement: start transaction;set transaction isolation level serializable;select last_value from "_replication".sl_action_seq;
2006-01-31 20:11:31 PST 17564 4848 LOG: duration: 0.216 ms
2006-01-31 20:11:31 PST 17564 4849 LOG: statement: rollback transaction;
2006-01-31 20:11:31 PST 17564 4850 LOG: duration: 0.066 ms
----------- now the all but one sl_log_1 entries are re-fetched
2006-01-31 20:11:31 PST 17563 32053 LOG: statement:
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888034', '21', '7865731', 'U',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888039', '24', '7865732', 'I',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888041', '24', '7865733', 'U',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888045', '21', '7865734', 'U',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888049', '21', '7865735', 'U',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888048', '153', '7865736', 'I',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888052', '23', '7865737', 'U',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888054', '24', '7865738', 'U',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888052', '156', '7865739', 'U',
insert into "_replication".sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) values ('1', '255888056', '21', '7865740', 'U',
2006-01-31 20:11:31 PST 17563 32054 LOG: duration: 0.932 ms
----------- and are re-performed
2006-01-31 20:11:31 PST 17563 32055 LOG: statement:
update only "public".
insert into "public".
update only "public".
update only "public".
update only "public". < huge sql statements from sl_log_1... >
insert into "public".
update only "public".
update only "public".
update only "public".
update only "public".
----------- boom
2006-01-31 20:11:31 PST 17563 32056 ERROR: duplicate key violates unique constraint
2006-01-31 20:11:31 PST 17563 32057 LOG: statement: rollback transaction
2006-01-31 20:11:31 PST 17563 32058 LOG: duration: 0.036 ms
2006-01-31 20:11:32 PST 17564 4851 LOG: statement: start transaction;set transaction isolation level serializable;select last_value from "_replication".sl_action_seq;
2006-01-31 20:11:32 PST 17564 4852 LOG: duration: 0.227 ms
:
|
|
|
|
|