Home > Archive > Slony1 PostgreSQL Replication > January 2006 > sl_subscribe incorrect after failover









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author sl_subscribe incorrect after failover
Melvin Davidson

2006-01-04, 1:24 pm

I have previously reported the problem of failover not working correctly
if two slaves are subscribed
to the same master node. I have now isolated that this only occurs if
the master node is _not_ node 1.

eg:
master_node slaves
node_1------------>node_101
\node_201

Failover works correctly


master_node slaves
node_101--------->node_201
\node_251

Failover causes the problem of the slave nodes pointing at each other,
each thinking
the other is both a master and a slave!

Here again is the complete test case and failure report.
====================
====================
====================
=============
slony1-1.1.2
PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3
20030502 (Red Hat Linux 3.2.3-49)

sl_subscribe is not being updated correctly after a "FAILOVER"

I have the following config
node 1 admin conninfo='dbname=con
trol host=main.comp.com
port=5450 user=postgres';
node 101 admin conninfo='dbname=mas
terdb host=main.comp.com port=5480
user=postgres';
node 151 admin conninfo='dbname=mas
terdb host=slavea.comp.com port=5450
user=postgres';
node 201 admin conninfo='dbname=mas
terdb host=slaveb.comp.com port=5480
user=postgres';
node 251 admin conninfo='dbname=mas
terdb host=slavec.comp.com port=5480
user=postgres';

node 1 exists only as a controller and is not subscribed to any node;
node 101 is the initial master
node 151 subscribes to node 101
node 201 subscribes to node 101
node 251 subscribes to node 251

pg_ctl and slon are stopped on node 101 to simulate system down

Before failover I have

sub_set |sub_provider |sub_receiver |sub_forward |sub_active
1 | 101 | 151 | t
| t
1 | 101 | 201 | t
| t
1 | 201 | 251 | t
| t

However after

failover (id = 101, backup node = 201);

I have
sub_set |sub_provider |sub_receiver |sub_forward |sub_active
1 | 201 | 251 | t
| t
1 | 151 | 201 | t
| t
1 | 201 | 251 | t
| t

on all nodes! Which is obviously wrong.

I have tried correcting the problem by manually deleting the incorrect
provider and then
cleaning sl_confirm, sl_event, sl_seqlog and sl_setsync on all nodes with

delete from sl_confirm;
delete from sl_event;
delete from sl_seqlog;
delete from sl_setsync;

after which slon can be restarted, but slony still thinks the new
provider node is
replicated, as evidenced by

slaveb=# insert into activation_code_pref
ix
slaveb-# (code_prefix, product_id)
slaveb-# values
slaveb-# ('XX99', 300);
ERROR: Slony-I: Table activation_code_pref
ix is replicated and cannot
be modified on a subscriber node

In plain language, this is very, very bad. :(

A fix or workaround would be greatly appreciated.

TIA,
Melvin Davidson
Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com