Home > Archive > Slony1 PostgreSQL Replication > September 2005 > Slony Falling Behind









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Slony Falling Behind
Joe Markwardt

2005-09-27, 11:24 am

Hi all,
I am having some problems trying to get slony to catch up after a
weekend of heavy updates. I've tried playing with the -s and -g settings
for slony, but that hasn't seemed to have any effect on the situation.
I'm not sure where to proceed from here, so heres where I say "uncle"
and ask for help from those wiser than I :-) I'm not sure what info you
need so I'll just have at it ( If I leave anything out let me know and
i'll get it for you):

setup: replicate 3 databases from node 1 to node 2
Each database is replicated using a separate slony config file (using
the slon-tools)

node 1: provider
PG 7.4.7
Slony 1.1.0

node 2: subsciber
PG 8.0.3
Slony 1.1.0

Problem:
node 2 is ~ 3 days behind node 1 after a series of heavy updates on the
weekend and falling further behind.

Symtops:
the FETCH statement on the provider is eating 100% of one of the CPU's
on the server, not touching the disks at all. The subscriber spends 99%
of its time Idle in Transaction (waiting for the fetch to finish?)

Details:

from the node2 slony log:

2005-09-27 10:55:28 CDT DEBUG2 remoteListenThread_1
: queue event
1,1691509 SYNC
2005-09-27 10:55:28 CDT DEBUG2 remoteHelperThread_1
_1: 120.122 seconds
delay for first row
2005-09-27 10:55:28 CDT DEBUG2 remoteHelperThread_1
_1: 120.122 seconds
until close cursor
2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1
: new sl_rowid_seq
value: 1000000000000000
2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1
: SYNC 1610537 done
in 120.170 seconds
2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1
: Received event
1,1610538 SYNC
2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1
: SYNC 1610538
processing
2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1
: syncing set 1 with
114 table(s) from provider 1
2005-09-27 10:55:30 CDT DEBUG2 syncThread: new sl_action_seq 1 - SYNC
635243

from sl_status:
select * from _pl_replication.sl_status ;
st_origin | st_received | st_last_event | st_last_event_ts |
st_last_received | st_last_received_ts |
st_last_received_eve
nt_ts | st_lag_num_events | st_lag_time
-----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+-----------------------
1 | 2 | 1691576 | 2005-09-27 10:58:05.907383 |
1610538 | 2005-09-27 10:55:44.046123 | 2005-09-24 14:02:55.407712 |
81038 | 2 days 20:55:12.07423

from sl_log_1:
select count(*) from _pl_replication.sl_log_1;
count
----------
14427361

Any ideas or suggestions on how to reverse this process so it will start
catching up again?
Christopher Browne

2005-09-27, 8:24 pm

Joe Markwardt wrote:

>Hi all,
> I am having some problems trying to get slony to catch up after a
>weekend of heavy updates. I've tried playing with the -s and -g settings
>for slony, but that hasn't seemed to have any effect on the situation.
>I'm not sure where to proceed from here, so heres where I say "uncle"
>and ask for help from those wiser than I :-) I'm not sure what info you
>need so I'll just have at it ( If I leave anything out let me know and
>i'll get it for you):
>
>setup: replicate 3 databases from node 1 to node 2
> Each database is replicated using a separate slony config file (using
>the slon-tools)
>
> node 1: provider
> PG 7.4.7
> Slony 1.1.0
>
> node 2: subsciber
> PG 8.0.3
> Slony 1.1.0
>
>Problem:
> node 2 is ~ 3 days behind node 1 after a series of heavy updates on the
>weekend and falling further behind.
>
>Symtops:
> the FETCH statement on the provider is eating 100% of one of the CPU's
>on the server, not touching the disks at all. The subscriber spends 99%
>of its time Idle in Transaction (waiting for the fetch to finish?)
>
>Details:
>
>from the node2 slony log:
>
>2005-09-27 10:55:28 CDT DEBUG2 remoteListenThread_1
: queue event
>1,1691509 SYNC
>2005-09-27 10:55:28 CDT DEBUG2 remoteHelperThread_1
_1: 120.122 seconds
>delay for first row
>2005-09-27 10:55:28 CDT DEBUG2 remoteHelperThread_1
_1: 120.122 seconds
>until close cursor
>2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1
: new sl_rowid_seq
>value: 1000000000000000
>2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1
: SYNC 1610537 done
>in 120.170 seconds
>2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1
: Received event
>1,1610538 SYNC
>2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1
: SYNC 1610538
>processing
>2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1
: syncing set 1 with
>114 table(s) from provider 1
>2005-09-27 10:55:30 CDT DEBUG2 syncThread: new sl_action_seq 1 - SYNC
>635243
>
>from sl_status:
>select * from _pl_replication.sl_status ;
> st_origin | st_received | st_last_event | st_last_event_ts |
>st_last_received | st_last_received_ts |
> st_last_received_eve
nt_ts | st_lag_num_events | st_lag_time
>-----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+-----------------------
> 1 | 2 | 1691576 | 2005-09-27 10:58:05.907383 |
>1610538 | 2005-09-27 10:55:44.046123 | 2005-09-24 14:02:55.407712 |
>81038 | 2 days 20:55:12.07423
>
>from sl_log_1:
> select count(*) from _pl_replication.sl_log_1;
> count
>----------
> 14427361
>
>Any ideas or suggestions on how to reverse this process so it will start
>catching up again?
>
>
>


We have just been able to successfully point at a Probably Really Good
answer to this; it just got checked into CVS HEAD and 1_1_STABLE...

Let me suggest that you create an extra index:

create index sl_log_1_idx2 on _pl_replication.sl_log_1(log_xid
_pl_replication.xxid_ops);

This needs to be created on any provider node; it won't be particularly
valuable on nodes that only subscribe.

It should MASSIVELY increase the speed of the queries; you will more
than likely see the "120.122 seconds delay for first row" message
diminish to just a few seconds.

In the short term, you might also want to consider two other things:

a) Set the "-s" option somewhat higher until things catch up; default
is 10000 ms; "-s60000" would do one SYNC per minute, which might be
useful...

b) It looks like you're processing just one SYNC at a time; make sure
the "-g" option is set to some value; we often run with "-g40" to group
40 SYNCs together at a time...
Jan Wieck

2005-09-27, 8:24 pm

On 9/27/2005 3:08 PM, Christopher Browne wrote:

> Joe Markwardt wrote:
>
>
> We have just been able to successfully point at a Probably Really Good
> answer to this; it just got checked into CVS HEAD and 1_1_STABLE...
>
> Let me suggest that you create an extra index:
>
> create index sl_log_1_idx2 on _pl_replication.sl_log_1(log_xid
> _pl_replication.xxid_ops);


Is that an optimization problem introduced by using BETWEEN instead of
the excessive OR chains?


Jan

>
> This needs to be created on any provider node; it won't be particularly
> valuable on nodes that only subscribe.
>
> It should MASSIVELY increase the speed of the queries; you will more
> than likely see the "120.122 seconds delay for first row" message
> diminish to just a few seconds.
>
> In the short term, you might also want to consider two other things:
>
> a) Set the "-s" option somewhat higher until things catch up; default
> is 10000 ms; "-s60000" would do one SYNC per minute, which might be
> useful...
>
> b) It looks like you're processing just one SYNC at a time; make sure
> the "-g" option is set to some value; we often run with "-g40" to group
> 40 SYNCs together at a time...
> ____________________
____________________
_______
> Slony1-general mailing list
> Slony1-general- AuKwsB3Fm+ugFIWk8tvy
RWD2FQJk+8+b@public.gmane.org
> http://gborg.postgresql.org/mailman.../slony1-general



--
#===================
====================
====================
===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#===================
====================
=========== JanWieck- bwPqjjyvM7QAvxtiuMwx
3w@public.gmane.org #
Christopher Browne

2005-09-27, 8:24 pm

Jan Wieck wrote:

> On 9/27/2005 3:08 PM, Christopher Browne wrote:
>
>
>
> Is that an optimization problem introduced by using BETWEEN instead of
> the excessive OR chains?


No, the BETWEEN patch is not in CVS yet, so nobody would be using it by
accident :-).

The extra index has proven useful with both version 1.0.5 and 1.1.0.
Joe Markwardt

2005-09-27, 8:24 pm


On Tue, 2005-09-27 at 15:38 -0400, Christopher Browne wrote:
> Jan Wieck wrote:
>
after a[color=darkred]
situation.[color=darkred]
"uncle"[color=darkred]
info[color=darkred]
and[color=darkred]
updates[color=darkre
d]
the[color=darkred]
spends[color=darkred
]
seconds[color=darkre
d]
seconds[color=darkre
d]
sl_rowid_seq[color=d
arkred]
done[color=darkred]
event[color=darkred]

with[color=darkred]
SYNC[color=darkred]
|[color=darkred]
-----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+-----------------------[color=darkred]
10:58:05.907383 |[color=darkred]
|[color=darkred]
Good[color=darkred]
of[color=darkred]
>
> No, the BETWEEN patch is not in CVS yet, so nobody would be using it

by
> accident :-).
>
> The extra index has proven useful with both version 1.0.5 and 1.1.0.
>

Unfortunately it doesn't appear that it helped me out any :-/

pl=# \d _pl_replication.sl_log_1
Table "_pl_replication.sl_log_1"
Column | Type | Modifiers
---------------+----------------------+-----------
log_origin | integer |
log_xid | _pl_replication.xxid |
log_tableid | integer |
log_actionseq | bigint |
log_cmdtype | character(1) |
log_cmddata | text |
Indexes:
"sl_log_1_idx1" btree (log_origin, log_xid _pl_replication.xxid_ops,
log_actionseq)
"sl_log_1_idx2" btree (log_xid _pl_replication.xxid_ops)



2005-09-27 14:39:06 CDT DEBUG2 remoteHelperThread_1
_1: 119.242 seconds
delay for first row
2005-09-27 14:39:06 CDT DEBUG2 remoteHelperThread_1
_1: 119.243 seconds
until close cursor
2005-09-27 14:39:06 CDT DEBUG2 remoteWorkerThread_1
: new sl_rowid_seq
value: 1000000000000000
2005-09-27 14:39:06 CDT DEBUG2 remoteWorkerThread_1
: SYNC 1610644 done
in 119.289 seconds

I did also make the changes to the -s and -g parameters and it appears
to be generating syncs less often, but it doesn't really look to be
catching up any faster.

postgres 12839 0.0 0.6 288840 51708 pts/3 S 14:48
0:00 /usr/local/pgsql/bin//slon -s 60000 -d2 -g80 pl_replication
host=192.168.1.200 dbname=pl



Any other changes you can suggest?
Hannu Krosing

2005-09-27, 8:24 pm

On T, 2005-09-27 at 15:27 -0400, Jan Wieck wrote:
> On 9/27/2005 3:08 PM, Christopher Browne wrote:
>
> Is that an optimization problem introduced by using BETWEEN instead of
> the excessive OR chains?


No, the original index is worse than useless in case the node is master
for more than one set which are subscribed to same subscriber, at least
on pg 7.4, maybe others.

By "worse than useless" I mean that postgres will perform an *indexscan*
on the first field (which is *constant*) of the three field index, which
is even slower than seqscan.

In one set per master case the plan uses two first fields and is
reasonably fast. The 3rd field is never used (ok, in case there is
exactly one xid it can be used for ordering).

--
Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org>
Jan Wieck

2005-09-27, 8:24 pm

On 9/27/2005 4:16 PM, Joe Markwardt wrote:

> On Tue, 2005-09-27 at 15:38 -0400, Christopher Browne wrote:
> after a
> situation.
> "uncle"
> info
> and
> updates
> the
> spends
> seconds
> seconds
> sl_rowid_seq
> done
> event
> with
> SYNC
> |
> -----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+-----------------------
> 10:58:05.907383 |
> |
> Good
> of
> by
> Unfortunately it doesn't appear that it helped me out any :-/
>
> pl=# \d _pl_replication.sl_log_1
> Table "_pl_replication.sl_log_1"
> Column | Type | Modifiers
> ---------------+----------------------+-----------
> log_origin | integer |
> log_xid | _pl_replication.xxid |
> log_tableid | integer |
> log_actionseq | bigint |
> log_cmdtype | character(1) |
> log_cmddata | text |
> Indexes:
> "sl_log_1_idx1" btree (log_origin, log_xid _pl_replication.xxid_ops,
> log_actionseq)
> "sl_log_1_idx2" btree (log_xid _pl_replication.xxid_ops)
>
>
>
> 2005-09-27 14:39:06 CDT DEBUG2 remoteHelperThread_1
_1: 119.242 seconds
> delay for first row
> 2005-09-27 14:39:06 CDT DEBUG2 remoteHelperThread_1
_1: 119.243 seconds
> until close cursor
> 2005-09-27 14:39:06 CDT DEBUG2 remoteWorkerThread_1
: new sl_rowid_seq
> value: 1000000000000000
> 2005-09-27 14:39:06 CDT DEBUG2 remoteWorkerThread_1
: SYNC 1610644 done
> in 119.289 seconds
>
> I did also make the changes to the -s and -g parameters and it appears
> to be generating syncs less often, but it doesn't really look to be
> catching up any faster.
>
> postgres 12839 0.0 0.6 288840 51708 pts/3 S 14:48
> 0:00 /usr/local/pgsql/bin//slon -s 60000 -d2 -g80 pl_replication
> host=192.168.1.200 dbname=pl
>
>
>
> Any other changes you can suggest?


Did you give it an ANALYZE after creating the index?


Jan


>
> ____________________
____________________
_______
> Slony1-general mailing list
> Slony1-general- AuKwsB3Fm+ugFIWk8tvy
RWD2FQJk+8+b@public.gmane.org
> http://gborg.postgresql.org/mailman.../slony1-general



--
#===================
====================
====================
===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#===================
====================
=========== JanWieck- bwPqjjyvM7QAvxtiuMwx
3w@public.gmane.org #
Joe Markwardt

2005-09-27, 8:24 pm


On Tue, 2005-09-27 at 17:30 -0400, Jan Wieck wrote:
> On 9/27/2005 4:16 PM, Joe Markwardt wrote:
>
>
> Did you give it an ANALYZE after creating the index?
>
>
> Jan
>

Yep. And I restarted the slon's as well, so they should be using the
index as long as the planner thinks it will help. Is there anyway I can
see the query that slon is generating to create the cursor? If I could
see that query then I could do some explain analyze's and see what they
say...

-Joe
>
>
>

Christopher Browne

2005-09-27, 8:24 pm

Joe Markwardt wrote:

> 2005-09-27 14:39:06 CDT DEBUG2 remoteHelperThread_1
_1: 119.242 seconds
>
>delay for first row
>2005-09-27 14:39:06 CDT DEBUG2 remoteHelperThread_1
_1: 119.243 seconds
>until close cursor
>2005-09-27 14:39:06 CDT DEBUG2 remoteWorkerThread_1
: new sl_rowid_seq
>value: 1000000000000000
>2005-09-27 14:39:06 CDT DEBUG2 remoteWorkerThread_1
: SYNC 1610644 done
>in 119.289 seconds
>
>I did also make the changes to the -s and -g parameters and it appears
>to be generating syncs less often, but it doesn't really look to be
>catching up any faster.
>
>postgres 12839 0.0 0.6 288840 51708 pts/3 S 14:48
>0:00 /usr/local/pgsql/bin//slon -s 60000 -d2 -g80 pl_replication
>host=192.168.1.200 dbname=pl
>
>
>

Hmm. Based on the parameters, it should be grouping sets of up to 80
SYNCs together, and I'd see there being a BIG difference between:

a) Processing 1 SYNC in 119.289 seconds (which would seem somewhat
crummy) and
b) Processing 80 SYNCs in 119.289 seconds.

Oh, on 1.1, you might try turning off the "adaptive SYNC grouping"
capability; that would involve adding the parameter "-o 0". For
"unusually heavy loads," that seems to have not turned out perfectly
wonderfully :-(.
Joe Markwardt

2005-09-27, 8:24 pm

On Tue, 2005-09-27 at 17:48 -0400, Christopher Browne wrote:
> Joe Markwardt wrote:
>
> Hmm. Based on the parameters, it should be grouping sets of up to 80
> SYNCs together, and I'd see there being a BIG difference between:
>
> a) Processing 1 SYNC in 119.289 seconds (which would seem somewhat
> crummy) and
> b) Processing 80 SYNCs in 119.289 seconds.
>
> Oh, on 1.1, you might try turning off the "adaptive SYNC grouping"
> capability; that would involve adding the parameter "-o 0". For
> "unusually heavy loads," that seems to have not turned out perfectly
> wonderfully :-(.
>

I'll give that a try.

As a side note, is there any way from the logs to determine how many
SYNCS its processing in that time frame? I have it set to -d4, just not
sure how to read the output :-)

-Joe
Hannu Krosing

2005-09-27, 8:24 pm

On T, 2005-09-27 at 16:37 -0500, Joe Markwardt wrote:

> Yep. And I restarted the slon's as well, so they should be using the
> index as long as the planner thinks it will help. Is there anyway I can
> see the query that slon is generating to create the cursor? If I could
> see that query then I could do some explain analyze's and see what they
> say...


I usually look at the queries in postgres log.

In your case you could set up postgres to log only long queries, say
over 50 sec, and then just tail -f the log.

When analysing take notice that slon specially sets enable_seqsnam and
enable_indexscan before running the query on sl_log_1.

--
Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org>
Joe Markwardt

2005-09-27, 8:24 pm

On Wed, 2005-09-28 at 01:20 +0300, Hannu Krosing wrote:
> On T, 2005-09-27 at 16:37 -0500, Joe Markwardt wrote:
>
>
> I usually look at the queries in postgres log.
>
> In your case you could set up postgres to log only long queries, say
> over 50 sec, and then just tail -f the log.
>
> When analysing take notice that slon specially sets enable_seqsnam and
> enable_indexscan before running the query on sl_log_1.
>

Thanks for the tip, but the only queries that are being logged are the
FETCH statements...I suppose I could set it up to log all queries (and
plans) and then restart slon, then turn off the query logging.

-Joe
Christopher Browne

2005-09-27, 8:24 pm

Joe Markwardt wrote:

>On Wed, 2005-09-28 at 01:20 +0300, Hannu Krosing wrote:
>
>
>Thanks for the tip, but the only queries that are being logged are the
>FETCH statements...I suppose I could set it up to log all queries (and
>plans) and then restart slon, then turn off the query logging.
>
>

Right you are...

The interesting query is the one that sets up the cursor, and it doesn't
take long :-(.

You can tell how many SYNCs are being processed together by looking at
two consecutive lines that say "SYNC XXXX done in Y.xyz seconds"

(They'll have lines in between...)

If it's "SYNC 1610644 done in 119 seconds" followed (2 minutes later) by
"SYNC 1610724 done in 119 seconds", then you know that it's doing groups
of 80, and even if it's not catching up quickly, that's at least the
right direction to go in.
Hannu Krosing

2005-09-28, 9:24 am

On T, 2005-09-27 at 17:35 -0500, Joe Markwardt wrote:
> On Wed, 2005-09-28 at 01:20 +0300, Hannu Krosing wrote:
> Thanks for the tip, but the only queries that are being logged are the
> FETCH statements...I suppose I could set it up to log all queries (and
> plans) and then restart slon, then turn off the query logging.


In that case you have to log all queries and find the last one before
the fetch with the same pid

fortunately logging is changed by "pg_ctl reload" so you dont need to
restart the database to change logging, and thus can turn on logging all
queries for a small time (say 3 min) only.

--
Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org>
Joe Markwardt

2005-09-28, 1:24 pm

On Tue, 2005-09-27 at 21:08 -0400, Christopher Browne wrote:
> Joe Markwardt wrote:
>
> Right you are...
>
> The interesting query is the one that sets up the cursor, and it doesn't
> take long :-(.
>
> You can tell how many SYNCs are being processed together by looking at
> two consecutive lines that say "SYNC XXXX done in Y.xyz seconds"
>
> (They'll have lines in between...)
>
> If it's "SYNC 1610644 done in 119 seconds" followed (2 minutes later) by
> "SYNC 1610724 done in 119 seconds", then you know that it's doing groups
> of 80, and even if it's not catching up quickly, that's at least the
> right direction to go in.
>


chris, going through the logs from yesterday afternoon it appears that
it is varying the number of SYNCs processed each time, isn't the -o0
supposed to force it to always do 80?

I keep getting log entries like this:

2005-09-28 11:06:29 CDT DEBUG3 calc sync size - last time: 1 last
length: 127436 ideal: 80 proposed size: 2

with varying values for proposed size...

Attached you'll find the output of the explain analyze's from the cursor
declaration I got from the logs. It appears that it is using the new
index, and the new index is indeed faster than a sequential scan on the
table. I'm not sure if there is anything else that could be done to
make this query run faster while it's catching up, but i'll play around
some.

Also, going through the logs, starting at about 3 and continuing up
through now (~137000 lines) I see this group of log lines repeated with
no "SYNC xxx done in yyy" between.

2005-09-28 03:18:06 CDT DEBUG3 remoteHelperThread_1
_1: got 100 log rows
2005-09-28 03:18:06 CDT DEBUG3 remoteHelperThread_1
_1: 10 log buffers
delivered
2005-09-28 03:18:06 CDT DEBUG4 remoteHelperThread_1
_1: allocate lines
2005-09-28 03:18:06 CDT DEBUG4 remoteHelperThread_1
_1: have 10 line
buffers

I'm guessing that it found a big transaction and this is probably the
cause of my performance issues?


-Joe


Hannu Krosing

2005-09-28, 8:25 pm

On K, 2005-09-28 at 12:37 -0500, Joe Markwardt wrote:
> On Tue, 2005-09-27 at 21:08 -0400, Christopher Browne wrote:
>
> chris, going through the logs from yesterday afternoon it appears that
> it is varying the number of SYNCs processed each time, isn't the -o0
> supposed to force it to always do 80?
>
> I keep getting log entries like this:
>
> 2005-09-28 11:06:29 CDT DEBUG3 calc sync size - last time: 1 last
> length: 127436 ideal: 80 proposed size: 2
>
> with varying values for proposed size...
>
> Attached you'll find the output of the explain analyze's from the cursor
> declaration I got from the logs. It appears that it is using the new
> index, and the new index is indeed faster than a sequential scan on the
> table. I'm not sure if there is anything else that could be done to
> make this query run faster while it's catching up, but i'll play around
> some.


So it takes 114 secs for indexscan over ~300k xids worth of rows
returning nothing. What hardware are you running on (RAM/disks/CPU) ?

Depending on how big sl_log_1 is, you may want to do a reindex on
sl_log_1.

Or maybe even
cluster sl_log_1_idx2 on sl_log_1;

Otoh, if your client side prosessing needs allow it, you could just re-
subscribe the biggest tables (preferrably drop all indexes except the PK
or UNIQUE index used by slony before doing it). It is often faster than
catching up.

--
Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org>
Joe Markwardt

2005-09-29, 9:24 am

On Thu, 2005-09-29 at 01:09 +0300, Hannu Krosing wrote:
> On K, 2005-09-28 at 12:37 -0500, Joe Markwardt wrote:
>
> So it takes 114 secs for indexscan over ~300k xids worth of rows
> returning nothing. What hardware are you running on (RAM/disks/CPU) ?
>

6 disk 15Krpm FC raid 1+0 array, dual opteron 246's, 8GB o ram

>From what I can tell from the plan its ~120 secs for an index scan over

~14 million rows, expecting to find 300K, but not finding any. Also,
according to vmstat and iostat, I'm not hitting the disk at all, so I'm
assuming that means the entire sl_log_1 table, or at least the index its
using for this query, is cached in memory...

> Depending on how big sl_log_1 is, you may want to do a reindex on
> sl_log_1.
>

hmm well as part of my testing I dropped all the indexes on sl_log_1 and
recreated them, so this in essence has already been done,if I understand
the REINDEX command correctly

> Or maybe even
> cluster sl_log_1_idx2 on sl_log_1;
>


Thats an idea, I'll give it a try and see if it helps

> Otoh, if your client side prosessing needs allow it, you could just re-
> subscribe the biggest tables (preferrably drop all indexes except the PK
> or UNIQUE index used by slony before doing it). It is often faster than
> catching up.


I was hoping to find a way to get replication to catch up (and stay
resonably caught up) because from what I can tell while this update load
is unusuall for now, it looks like it could be becoming commonplace
within the next 6 months. And while re-subscribing is way faster (takes
about 8 hours to complete for the entire set) its not a "good" thing to
do every weekend after the heavy updates run. If thats what I have to
do to deal with single query's that change large numbers of rows and
maintain a decent lag time b/w the two nodes, then I guess thats what I
have to do. It would be nice if slony could handle this edge case a
little nicer, (just replicate the 1 query instead of turning it into 5
million ) but I certainly can't see any way around it when you use
triggers, and it fits the rest of our needs well.

Anyways it appears that it got over the 5 million line update
transaction finally, and now its flying along the way it usually does.

2005-09-29 08:36:15 CDT DEBUG2 remoteWorkerThread_1
: SYNC 1691933 done
in 0.798 seconds

from the way things look it should be caught up before the end of the
day :-)

-Joe
Christopher Browne

2005-09-29, 11:24 am

Joe Markwardt wrote:

>On Thu, 2005-09-29 at 01:09 +0300, Hannu Krosing wrote:
>
>
>6 disk 15Krpm FC raid 1+0 array, dual opteron 246's, 8GB o ram
>
>~14 million rows, expecting to find 300K, but not finding any. Also,
>according to vmstat and iostat, I'm not hitting the disk at all, so I'm
>assuming that means the entire sl_log_1 table, or at least the index its
>using for this query, is cached in memory...
>
>
>

That's not too surprising.

>hmm well as part of my testing I dropped all the indexes on sl_log_1 and
>recreated them, so this in essence has already been done,if I understand
>the REINDEX command correctly
>
>
>

Correct.

>
>Thats an idea, I'll give it a try and see if it helps
>
>

clustering essentially does a reindex and VACUUM FULL in one swell foop,
which, if the table has way too much dead space, is a better deal than
VACUUM FULL.

Based on your other comments, I'd suggest that you wait a while until
the cleanup thread gets around to purging out at least the 5M row "clot"
in sl_log_1. Once you see that "vacuum verbose analyze
_my_slony_schema.sl_log_1;" is running reasonably quickly and is finding
only a fairly small number of live tuples (ideally, hundreds; more
likely some thousands, hopefully not tens of thousands...), THEN I'd do
the cluster.

1. Because the VACUUM was just run, and ran quickly, you know the data
is all in cache
2. Because there are only a limited population of live tuples, it won't
spend forever building the index and ordering the remaining tuples.

>
>
>
>I was hoping to find a way to get replication to catch up (and stay
>resonably caught up) because from what I can tell while this update load
>is unusuall for now, it looks like it could be becoming commonplace
>within the next 6 months. And while re-subscribing is way faster (takes
>about 8 hours to complete for the entire set) its not a "good" thing to
>do every weekend after the heavy updates run. If thats what I have to
>do to deal with single query's that change large numbers of rows and
>maintain a decent lag time b/w the two nodes, then I guess thats what I
>have to do. It would be nice if slony could handle this edge case a
>little nicer, (just replicate the 1 query instead of turning it into 5
>million ) but I certainly can't see any way around it when you use
>triggers, and it fits the rest of our needs well.
>
>Anyways it appears that it got over the 5 million line update
>transaction finally, and now its flying along the way it usually does.
>
>2005-09-29 08:36:15 CDT DEBUG2 remoteWorkerThread_1
: SYNC 1691933 done
>in 0.798 seconds
>
>from the way things look it should be caught up before the end of the
>day :-)
>
>

It seems to me that you got hit by kind of too much at once, which was
unfortunate.

Some things to consider...

1. Slony-I 1.2 should handle subscriptions *way* faster as it
deactivates indexing while copying data. It looks like a common
scenario will be for this to triple the speed of the COPY_SET as
compared to earlier versions.

2. There is a way to handle the "5 million updates" scenario, albeit in
a manner that requires some planning, and *possibly* an application outage.

You could submit the 5-million-row update via EXECUTE SCRIPT, and the
result would be that the update would be submitted as 1 query to each host.

There is a serious caveat to this (one which is pretty well inevitable
and not a "bug to be fixed"), namely that EXECUTE SCRIPT needs to lock
all the tables in the set for the duration of the queries.

That is, on each node, EXECUTE SCRIPT does the following:

for each r_table in replication_set_1 do
lock table r_table
remove replication triggers from table r_table
done
execute the SQL script
for each r_table in replication_set_1 do
restore replication triggers to table r_table
unlock table r_table
done

That means that every table in the set is locked, then the 5M row update
runs for (oh, say) 8 minutes, and then every table is unlocked.

Applications that try to access the origin node concurrently with this
may assortedly:
- Lead to deadlocks that prevent this from running at all
- Get locked out for 8 minutes

If you can set some quasi regular schedule for doing these sorts of mass
updates, then it'll at least not come as a surprise...

I know some people have had difficulty with deadlocks when trying to run
EXECUTE SCRIPT in that they couldn't declare an outage; your milage may
certainly vary :-).

One quasi-devious thought which comes to mind is to try to use LOCK SET
to assert an outage...

Thus, you might loop thus...

#!/bin/bash
signal=1
while (signal == 1) do
slonik <<EOF
include 'preamble';
try {
lock set (id = 1);
exit 0;
} on error {
exit 1; # probably means we hit a deadlock
}
EOF
signal=$?
} # We don't get out of the loop until we successfully lock the set...
# Now, the set is locked...
slonik <<EOF
include 'preamble';
unlock set (id = 1); # unlock it, then run the script...
execute script ...
EOF
Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com