Home > Archive > Slony1 PostgreSQL Replication > February 2006 > outstanding sync events









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author outstanding sync events
lkv

2006-02-25, 9:50 am

Hi everyone,

I'm observing something odd, on the master I see a huge chunk of
st_lag_num_events (~190000) and on the same slave thats behind the
number is about 3200 and growing. I increased the sync interval to
about 60 seconds and increased the group max size to about 1000.

These numbers are not going down, even after I restarted the slon daemons.

However these sync events are still outstanding. Is there anyway I can
flush them? And can anyone give me a hint what might have caused that?
(all the outstanding events are of type SYNC)

TIA,
l
Christopher Browne

2006-02-25, 9:50 am

lkv <lkv-fJ0KnOZh6dk@public.gmane.org> writes:
>
> I'm observing something odd, on the master I see a huge chunk of
> st_lag_num_events (~190000) and on the same slave thats behind the
> number is about 3200 and growing. I increased the sync interval to
> about 60 seconds and increased the group max size to about 1000.
>
> These numbers are not going down, even after I restarted the slon daemons.
>
> However these sync events are still outstanding. Is there anyway I can
> flush them? And can anyone give me a hint what might have caused that?
> (all the outstanding events are of type SYNC)


"Flushing" outstanding SYNC events would amount to abandonment of the
node. You MUST apply ALL (that is, "each and every") SYNC event from
the origin to each of the subscribers. So unless you're planning on
abandoning replication of that node, you should put "flush" thoughts
out of your head.

The *real* question is whether or not that slave node is actually
processing SYNC events at all.

If it isn't, then you need to know why it isn't; that's some sort of
problem that is altogether preventing replication. We don't know what
that problem is... Perhaps some authentication problem is preventing
connections from going thru; that's easy to fix, if you know that's
the case.

If SYNC events *are* being processed, but not fast enough, there are a
few reasons why this can happen that might be resolvable. (e.g. -
pg_listener has grown big, which is solved by doing a VACUUM FULL on
it, and verifying that it is small again.) It is also possible that
the node is behind by too much, it might be more effective to abandon
the node and recreate it from scratch.

But you'll need to dig into the logs in order to figure any of this
out.
--
(format nil "~S@~S" "cbbrowne" "ca.afilias.info")
<http://dba2.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)
Vivek Khera

2006-02-25, 9:50 am


On Feb 22, 2006, at 12:28 PM, lkv wrote:

> I'm observing something odd, on the master I see a huge chunk of
> st_lag_num_events (~190000) and on the same slave thats behind the


how big is sl_log_1 on the master? If it is > O(1M) then you're gonna
have a tough time catching up unless you have a *lot* of spare I/O
bandwidth.
lkv

2006-02-25, 9:50 am

Vivek Khera wrote:
> On Feb 22, 2006, at 12:28 PM, lkv wrote:
>
>
> how big is sl_log_1 on the master? If it is > O(1M) then you're gonna
> have a tough time catching up unless you have a *lot* of spare I/O
> bandwidth.


Hi Vivek,

sl_log_1 and 2 were empty.
actually i fixed the problem thanks to a hint from Jan i found in the
archives:

http://article.gmane.org/gmane.comp...y1.general/1617

-- snip
Is there anything on the provider node that locks sl_event with and
access exclusive lock? Or is that providers (node 154) sl_event table
so bloated that 5 minutes to select some events is reasonable?
-- snip

my case was the latter. 5min were not enough.

the provider sl_event had a hugeeeee number of events and the select
from the node that was behind would take literally ages to read the
table, definitely more than 2x5 min. my solution here was to start a
local slon daemon on the provider node and wait for it to clear them,
it took about 2h to clear all 200k events in sl_event.

i also cleaned pg_listener.

i think here the problem was the kind-of-slow-link (15-30KB) which
sometimes could get quite saturated. now all looks good, but let see
for how long.

thanks,
l
Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com