Home > Archive > Slony1 PostgreSQL Replication > September 2005 > really inefficient queries with slony 1.1 and









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author really inefficient queries with slony 1.1 and
Hannu Krosing

2005-08-29, 1:24 pm

I have currently some performance problems, where slony generates a
query that postgres 7.4.5 does not optimise well

the query generated by slony when it replicates 4 sets from node 1 loocs
like this (really it is in "declare cursor")

select log_origin, log_xid, log_tableid, log_actionseq,
log_cmdtype, log_cmddata
from "_bbb_cluster".sl_log_1
where log_origin = 1
and (
( log_tableid in (3,9008,9007,9005,90
04,2002,2001)
and (log_xid < '1312955843'
and "_bbb_cluster". xxid_lt_snapshot(log
_xid,
'1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''1312950014
'',''1312952044'''))


and (log_xid >= '1312942023'
and "_bbb_cluster". xxid_ge_snapshot(log
_xid,
'1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''1312946242
'''))
)
or
( log_tableid in
(1002,1003,1013,1041
,1037,1028,1026,1023
,1031,1012,1048,1050
,1046,1021,1019,1024
,1027,1029,1025,1035
,1011,1009,1010,1016
,1032,1018,1030,1138
)

and (log_xid < '1312955843'
and "_bbb_cluster". xxid_lt_snapshot(log
_xid,
'1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''1312950014
'',''1312952044'''))


and (log_xid >= '1312942023'
and "_bbb_cluster". xxid_ge_snapshot(log
_xid,
'1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''1312946242
'''))
)
or
( log_tableid in
(7001,7008,7007,7004
,7039,7002,7030,7018
,7038,7003,7005,7006
,7009,7011,7012,7013
,7016,7021,
7022,7025,7026,7027,
7028,7029,7031,7033,
7034,7035,7036,7037,
1075,9009,9011,9012,
9013,9014,
9015,9016,9017,1051,
1052,1053,1054,1055,
1056,1057,1058,1059,
1060,1061,1062,1063,
1064,1065,
1066,1067,1068,1070,
1071,1072,1073,1074,
1076,1077,1078)
and (log_xid < '1312955843'
and "_bbb_cluster". xxid_lt_snapshot(log
_xid,
'1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''1312950014
'',''1312952044'''))

and (log_xid >= '1312942023'
and "_bbb_cluster". xxid_ge_snapshot(log
_xid,
'1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''1312946242
'''))
)
or
( log_tableid in
(7051,7050,7052,7053
,7054,7055,7056,7057
,7058,7059,7060,7061
,7062,7063,7064,7065
,7066,7067,7068,7069
,7070,7071,7072,7073
,7074)
and (log_xid < '1312955843'
and "_bbb_cluster". xxid_lt_snapshot(log
_xid,
'1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''1312950014
'',''1312952044'''))

and (log_xid >= '1312942023'
and "_bbb_cluster". xxid_ge_snapshot(log
_xid,
'1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''1312946242
''')) )
)
order by log_actionseq;

And this is done by postgres a seqscan, fro which the first FETCH runs
about 60 sec :(

as everything other than "log_tableid in (n,n,n) is the same in all OR'd
parts, the query could actually be done as

select log_origin, log_xid, log_tableid, log_actionseq,
log_cmdtype, log_cmddata
from "_bbb_cluster".sl_log_1
where log_origin = 1
and (log_xid >= '1312942023' and "_bbb_cluster".xxid_ge_snapshot
(log_xid,
'1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''1312946242
'''))
and (log_xid < '1312955843' and "_bbb_cluster".xxid_lt_snapshot
(log_xid,
'1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''1312950014
'',''1312952044'''))

and log_tableid in
(3,9008,9007,9005,90
04,2002,2001,1002,10
03,1013,1041,1037,10
28,1026,1023,1031,10
12,1048,1050,1046,
1021,1019,1024,1027,
1029,1025,1035,1011,
1009,1010,1016,1032,
1018,1030,1138,7001,
7008,7007,7004,7039,

7002,7030,7018,7038,
7003,7005,7006,7009,
7011,7012,7013,7016,
7021,7022,7025,7026,
7027,7028,7029,7031,

7033,7034,7035,7036,
7037,1075,9009,9011,
9012,9013,9014,9015,
9016,9017,1051,1052,
1053,1054,1055,1056,

1057,1058,1059,1060,
1061,1062,1063,1064,
1065,1066,1067,1068,
1070,1071,1072,1073,
1074,1076,1077,1078,

7051,7050,7052,7053,
7054,7055,7056,7057,
7058,7059,7060,7061,
7062,7063,7064,7065,
7066,7067,7068,7069,

7070,7071,7072,7073,
7074)
order by log_actionseq;

which is done as index scan and runs in 0.4 sec (as measured by explain
analyse)

So I'd propose that if there are several table sets subscribed from the
same master, their id lists should be merged before generating the query
instead of generating a similar query for each set and then OR'ing
these.

--
Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org>
Christopher Browne

2005-08-29, 8:26 pm

Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org> writes:
> I have currently some performance problems, where slony generates a
> query that postgres 7.4.5 does not optimise well
>
> the query generated by slony when it replicates 4 sets from node 1 loocs
> like this (really it is in "declare cursor")
>
> select log_origin, log_xid, log_tableid, log_actionseq,
> log_cmdtype, log_cmddata
> from "_bbb_cluster".sl_log_1
> where log_origin = 1
> and (

.... much elided ...
> )
> order by log_actionseq;
>
> And this is done by postgres a seqscan, fro which the first FETCH runs
> about 60 sec :(


That is indeed unfortunate. And this is the sort of case that is most
likely to occur when processing the first few SYNCs after subscribing
a big new set.

> as everything other than "log_tableid in (n,n,n) is the same in all
> OR'd parts, the query could actually be done as
>
> select log_origin, log_xid, log_tableid, log_actionseq,
> log_cmdtype, log_cmddata
> from "_bbb_cluster".sl_log_1
> where log_origin = 1
> and (log_xid >= '1312942023' and "_bbb_cluster".xxid_ge_snapshot
> (log_xid,
> '1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''1312946242
'''))
> and (log_xid < '1312955843' and "_bbb_cluster".xxid_lt_snapshot
> (log_xid,
> '1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''1312950014
'',''1312952044'''))

> and log_tableid in
> (3,9008,9007,9005,90
04,2002,2001,1002,10
03,1013,1041,1037,10
28,1026,1023,1031,10
12,1048,1050,1046,
> 1021,1019,1024,1027,
1029,1025,1035,1011,
1009,1010,1016,1032,
1018,1030,1138,7001,
7008,7007,7004,7039,

> 7002,7030,7018,7038,
7003,7005,7006,7009,
7011,7012,7013,7016,
7021,7022,7025,7026,
7027,7028,7029,7031,

> 7033,7034,7035,7036,
7037,1075,9009,9011,
9012,9013,9014,9015,
9016,9017,1051,1052,
1053,1054,1055,1056,

> 1057,1058,1059,1060,
1061,1062,1063,1064,
1065,1066,1067,1068,
1070,1071,1072,1073,
1074,1076,1077,1078,

> 7051,7050,7052,7053,
7054,7055,7056,7057,
7058,7059,7060,7061,
7062,7063,7064,7065,
7066,7067,7068,7069,

> 7070,7071,7072,7073,
7074)
> order by log_actionseq;
>
> which is done as index scan and runs in 0.4 sec (as measured by
> explain analyse)
>
> So I'd propose that if there are several table sets subscribed from
> the same master, their id lists should be merged before generating
> the query instead of generating a similar query for each set and
> then OR'ing these.


Regrettably, C doesn't have any "similarity" operator to detect the
relevant similarities.

It seems to me that we need to do something fairly fundamentally
better than this.

There was a patch I sent out a while back which would compress
sequences of log_actionseq items into a series of "between" clauses.

This big query points onwards to the notion of perhaps constructing
some temp tables...

1. There would be one consisting of a set identifier and then
log_tableid values.

That could turn the clause like "and log_tableid in
(3,9008,castofthousa
nds)" into ...

"and log_tableid = t1.table_id "

2. I think we'd stow set/minlogxid values in another temp table, and
thus have...

"and log_xid >= t2.minlogxid "

3. Likewise, stow set/maxlogxid in a third temp table, and have

"and log_xid < t3.maxlogxid "

This should be a simple matter of programming, right? :-)

If we construct a temp table for each SYNC, that probably leads to a
certain amount of bloating of pg_attribute/pg_class; they'd have to
get vacuumed once in a while too :-).

It is a SMOP which I seriously doubt I can get to until on the order
of November. Feel free to volunteer...
--
output = reverse("ofni.sailifa.ac" "@" "enworbbc")
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)
Hannu Krosing

2005-08-30, 7:30 am

On E, 2005-08-29 at 18:04 -0400, Christopher Browne wrote:
> Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org> writes:
> ... much elided ...
>
> That is indeed unfortunate. And this is the sort of case that is most
> likely to occur when processing the first few SYNCs after subscribing
> a big new set.


Actually I have loads that cause this thing to happen when some replica
falls back for more than a few minutes, usually caused by pg_listener
bloat due to long-running analysis queries;

>
> Regrettably, C doesn't have any "similarity" operator to detect the
> relevant similarities.


>From my casual observations it seems that if sets have the same origin,

they have (almost?) always the exactly _same_ conditions after the

log_tableid in (...)

part, something silimar to:

and (log_xid >= '1312942023' and "_bbb_cluster".xxid_ge_snapshot
(log_xid,
'1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''1312946242
'''))
and (log_xid < '1312955843' and "_bbb_cluster".xxid_lt_snapshot
(log_xid,
'1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''1312950014
'',''1312952044'''))


> It seems to me that we need to do something fairly fundamentally
> better than this.
>
> There was a patch I sent out a while back which would compress
> sequences of log_actionseq items into a series of "between" clauses.


this was for a different case, namely "long id lists after subscribe."

> This big query points onwards to the notion of perhaps constructing
> some temp tables...
>
> 1. There would be one consisting of a set identifier and then
> log_tableid values.
>
> That could turn the clause like "and log_tableid in
> (3,9008,castofthousa
nds)" into ...
>
> "and log_tableid = t1.table_id "


I don't think the main thing in my case is long id lists, but rather an
OR of 4 identical (except in id lists) queries which cause PG optimiser
not to notice that it can lift the identical "and log_xid >= XXX" part
out of all the or'ed subclauses, get plain

"log_origin=N and log_xid >= M and ..." and

use index on

(log_origin, log_xid, log_actionseq)

to do the query in an effective way.

> 2. I think we'd stow set/minlogxid values in another temp table, and
> thus have...
>
> "and log_xid >= t2.minlogxid "
>
> 3. Likewise, stow set/maxlogxid in a third temp table, and have
>
> "and log_xid < t3.maxlogxid "
>
> This should be a simple matter of programming, right? :-)


sure ;)

> If we construct a temp table for each SYNC, that probably leads to a
> certain amount of bloating of pg_attribute/pg_class; they'd have to
> get vacuumed once in a while too :-).
>
> It is a SMOP which I seriously doubt I can get to until on the order
> of November. Feel free to volunteer...


I'd try to do it the way I proposed in my previous letter, and hopefully
clarified a little in this letter.

Just need a confirmation by somebody, that slony 1.1 does indeed get the
same sync intervals for sets subscribed from the same master (at least
most of the time).

And maybe a hint on where in the code is this query (which gets the
actual changes from sl_log_1) constructed.

again :

this is what slony generates for 3 sets (1,2,3), (11,12,13), (21,22,23):
------------------------------------------------------------------------
select log_origin, log_xid, log_tableid,
log_actionseq, log_cmdtype, log_cmddata
from "_bbb_cluster".sl_log_1
where log_origin = 1
and (
( log_tableid in (1,2,3)
and (log_xid < 'XMIN'
and "_bbb_cluster". xxid_lt_snapshot(log
_xid, 'AAA'))
and (log_xid >= 'XMAX'
and "_bbb_cluster". xxid_ge_snapshot(log
_xid,'BBB'))
)
or
( log_tableid in (11,12,13)
and (log_xid < 'XMIN'
and "_bbb_cluster". xxid_lt_snapshot(log
_xid, 'AAA'))
and (log_xid >= 'XMAX'
and "_bbb_cluster". xxid_ge_snapshot(log
_xid,'BBB'))
)
or
( log_tableid in (21,22,23)
and (log_xid < '1312955843' 'XMIN'
and "_bbb_cluster". xxid_lt_snapshot(log
_xid, 'AAA'))
and (log_xid >= '1312942023' 'XMAX'
and "_bbb_cluster". xxid_ge_snapshot(log
_xid,'BBB'))
)
)
order by log_actionseq;
------------------------------------------------------------------------

as everything in each of the or'ed subclauses is the same excep the
"log_tableid in (...)" list, the rest can be lifted out and the query
would look like this:
------------------------------------------------------------------------
select log_origin, log_xid, log_tableid,
log_actionseq, log_cmdtype, log_cmddata
from "_bbb_cluster".sl_log_1
where log_origin = 1
and (log_xid < 'XMIN'
and "_bbb_cluster". xxid_lt_snapshot(log
_xid, 'AAA'))
and (log_xid >= 'XMAX'
and "_bbb_cluster". xxid_ge_snapshot(log
_xid,'BBB'))
and (
( log_tableid in (1,2,3)
)
or
( log_tableid in (11,12,13)
)
or
( log_tableid in (21,22,23)
)
)
order by log_actionseq;
------------------------------------------------------------------------

Actually I hope I don't have to do it this way, by I just check that
XMIN, AAA, XMAX, BBB are the same and will merge the tableid lists and
generate the query once to look like this
-----------------------------------------------------------------------
select log_origin, log_xid, log_tableid,
log_actionseq, log_cmdtype, log_cmddata
from "_bbb_cluster".sl_log_1
where log_origin = 1
and log_tableid in (1,2,3,11,12,13,21,2
2,23)
and (log_xid < '1312955843'
and "_bbb_cluster". xxid_lt_snapshot(log
_xid, 'AAA'))
and (log_xid >= '1312942023'
and "_bbb_cluster". xxid_ge_snapshot(log
_xid,'BBB'))
order by log_actionseq;
-----------------------------------------------------------------------

And if it is the case (as I hope/suspect) that XMIN, AAA, XMAX, BBB are
always the same for same origin, I can even skip the "detecting" part :)

--
Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org>
Michael Crozier

2005-09-01, 8:26 pm


I was having this same problem this morning, after the completed copy of a XX
million row table from a 7.3.8 database to a 7.4.8. The copy took 20 hours
and there were hundreds of thousands of transactions during that time.

As you describe (but far worse on my old hardware), queries of sl_log_1 were
taking nearly ten minutes. No vacuuming or analyzing made a difference and
the queries were "de-optimized" into sequential scans. In novice's
desperation, I tried adding a new index on only the log_xid column. The
queries began using this index in favor of sequential scans and the
subscriber is now catching up.

This solution makes little sense, but it certainly worked for my particular
scenario.

Here is the planner's explanation, after the index:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=363713.01..363752.42 rows=15762 width=255)
Sort Key: log_actionseq
-> Index Scan using sl_log_1_idx2, sl_log_1_idx2 on sl_log_1
(cost=0.00..362614.05 rows=15762 width=255)
Index Cond: (((log_xid < '1750854484'::xxid) AND (log_xid >=
'1750179272'::xxid))
OR ((log_xid < '1750854484'::xxid) AND (log_xid >=
'1750179272'::xxid))
)
Filter: ((log_origin = 1) AND
(((log_tableid = 93) AND (log_xid < '1750854484'::xxid) AND
xxid_lt_snapshot(log
_xid,'1750179272:175
0854484:1750854483,1
750804298,1750179272
,1750851104,17504334
33,1750853878,175085
4481'::xxid_snapshot
)
AND (log_xid >= '1750179272'::xxid) AND xxid_ge_snapshot(log
_xid,
'1750179272:17508542
78:1750854275,175085
4250,1750854274,1750
804298,1750179272,17
50851104,1750433433,
1750853878,175085427
6'::xxid_snapshot))
OR ((<many many many table id's> ) AND (log_xid < '1750854484'::xxid) AND
xxid_lt_snapshot(log
_xid,'1750179272:175
0854484:1750854483,1
750804298,1750179272
,1750851104,17504334
33,1750853878,175085
4481'::xxid_snapshot
)
AND (log_xid >= '1750179272'::xxid) AND xxid_ge_snapshot(log
_xid,
'1750179272:17508542
78:1750854275,175085
4250,1750854274,1750
804298,1750179272,17
50851104,1750433433,
1750853878,175085427
6'::xxid_snapshot)))
)
(5 rows)



> I have currently some performance problems, where slony generates a
> query that postgres 7.4.5 does not optimise well
>
> the query generated by slony when it replicates 4 sets from node 1 loocs
> like this (really it is in "declare cursor")
>
> select log_origin, log_xid, log_tableid, log_actionseq,
> log_cmdtype, log_cmddata
> from "_bbb_cluster".sl_log_1
> where log_origin = 1
> and (
> ( log_tableid in (3,9008,9007,9005,90
04,2002,2001)
> and (log_xid < '1312955843'
> and "_bbb_cluster". xxid_lt_snapshot(log
_xid,
> '1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''13129
> 50014'',''1312952044
'''))
>
> and (log_xid >= '1312942023'
> and "_bbb_cluster". xxid_ge_snapshot(log
_xid,
> '1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''13129
>46242''')) )
> or
> ( log_tableid in
> (1002,1003,1013,1041
,1037,1028,1026,1023
,1031,1012,1048,1050
,1046,1021,1019
> ,1024,1027,1029,1025
,1035,1011,1009,1010
,1016,1032,1018,1030
,1138)
>
> and (log_xid < '1312955843'
> and "_bbb_cluster". xxid_lt_snapshot(log
_xid,
> '1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''13129
> 50014'',''1312952044
'''))
>
> and (log_xid >= '1312942023'
> and "_bbb_cluster". xxid_ge_snapshot(log
_xid,
> '1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''13129
>46242''')) )
> or
> ( log_tableid in
> (7001,7008,7007,7004
,7039,7002,7030,7018
,7038,7003,7005,7006
,7009,7011,7012
>,7013,7016,7021,
> 7022,7025,7026,7027,
7028,7029,7031,7033,
7034,7035,7036,7037,
1075,9009,9011,
>9012,9013,9014,
> 9015,9016,9017,1051,
1052,1053,1054,1055,
1056,1057,1058,1059,
1060,1061,1062,
>1063,1064,1065, 1066,1067,1068,1070,
1071,1072,1073,1074,
1076,1077,1078)
> and (log_xid < '1312955843'
> and "_bbb_cluster". xxid_lt_snapshot(log
_xid,
> '1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''13129
> 50014'',''1312952044
''')) and (log_xid >= '1312942023'
> and "_bbb_cluster". xxid_ge_snapshot(log
_xid,
> '1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''13129
>46242''')) )
> or
> ( log_tableid in
> (7051,7050,7052,7053
,7054,7055,7056,7057
,7058,7059,7060,7061
,7062,7063,7064
> ,7065,7066,7067,7068
,7069,7070,7071,7072
,7073,7074) and (log_xid <
> '1312955843'
> and "_bbb_cluster". xxid_lt_snapshot(log
_xid,
> '1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''13129
> 50014'',''1312952044
''')) and (log_xid >= '1312942023'
> and "_bbb_cluster". xxid_ge_snapshot(log
_xid,
> '1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''13129
>46242''')) ) )
> order by log_actionseq;
>
> And this is done by postgres a seqscan, fro which the first FETCH runs
> about 60 sec :(
>
> as everything other than "log_tableid in (n,n,n) is the same in all OR'd
> parts, the query could actually be done as
>
> select log_origin, log_xid, log_tableid, log_actionseq,
> log_cmdtype, log_cmddata
> from "_bbb_cluster".sl_log_1
> where log_origin = 1
> and (log_xid >= '1312942023' and "_bbb_cluster".xxid_ge_snapshot
> (log_xid,
> '1312942023:13129479
35:''1312947917'',''
1312947924'',''13129
42023'',''13129
>46242''')) and (log_xid < '1312955843' and
> "_bbb_cluster".xxid_lt_snapshot (log_xid,
> '1312950014:13129558
43:''1312955836'',''
1312955783'',''13129
55806'',''13129
> 50014'',''1312952044
''')) and log_tableid in
> (3,9008,9007,9005,90
04,2002,2001,1002,10
03,1013,1041,1037,10
28,1026,1023,10
> 31,1012,1048,1050,10
46,
> 1021,1019,1024,1027,
1029,1025,1035,1011,
1009,1010,1016,1032,
1018,1030,1138,
> 7001,7008,7007,7004,
7039,
> 7002,7030,7018,7038,
7003,7005,7006,7009,
7011,7012,7013,7016,
7021,7022,7025,
> 7026,7027,7028,7029,
7031,
> 7033,7034,7035,7036,
7037,1075,9009,9011,
9012,9013,9014,9015,
9016,9017,1051,
> 1052,1053,1054,1055,
1056,
> 1057,1058,1059,1060,
1061,1062,1063,1064,
1065,1066,1067,1068,
1070,1071,1072,
> 1073,1074,1076,1077,
1078,
> 7051,7050,7052,7053,
7054,7055,7056,7057,
7058,7059,7060,7061,
7062,7063,7064,
> 7065,7066,7067,7068,
7069, 7070,7071,7072,7073,
7074)
> order by log_actionseq;
>
> which is done as index scan and runs in 0.4 sec (as measured by explain
> analyse)
>
> So I'd propose that if there are several table sets subscribed from the
> same master, their id lists should be merged before generating the query
> instead of generating a similar query for each set and then OR'ing
> these.


--

Conducive Technology Corporation
Taking air cargo information
to the next level

http://www.conducivetech.com
http://www.pathfinder-web.com

Michael Crozier crozierm- 19LDBNnCZmbFzinHIz5S
+QC/G2K4zDHf@public.gmane.org

Voice: 503.445.4233
Fax: 503.274.0939
cbbrowne-swQf4SbcV9C7WVzo/KQ3Mw@public.gmane.org

2005-09-02, 3:28 am

>
> I was having this same problem this morning, after the completed copy o=

f a
> XX
> million row table from a 7.3.8 database to a 7.4.8. The copy took 20
> hours
> and there were hundreds of thousands of transactions during that time.
>
> As you describe (but far worse on my old hardware), queries of sl_log_1
> were
> taking nearly ten minutes. No vacuuming or analyzing made a difference
> and
> the queries were "de-optimized" into sequential scans. In novice's
> desperation, I tried adding a new index on only the log_xid column. Th=

e
> queries began using this index in favor of sequential scans and the
> subscriber is now catching up.
>
> This solution makes little sense, but it certainly worked for my
> particular
> scenario.


No, that does make some sense.

The XID index could lead to filtering out some sizable proportion of the
rows, which would indeed help.

I could see sl_log_1.log_origin often being essentially useless as a
criterion, as queries would often pick *all* values in that column.

Ditto for sl_log_1.log_tableid, as you'll normally be drawing *all* the
tables.

sl_log_1.log_cmdtype isn't a selection criterion at all, and ditto for
sl_log_1.log_cmddata.

I suppose that in version 8.1, there would be merit to putting an index
each on log_xid and log_actionseq, bitmap indexing then doing some good.

I find it unremarkable that people find some value in adding some
additional indices. The set of indexes has always looked pretty
minimalist to me.

Methinks it warrants documenting the "possibles," and it's worth pointing
out that if adding an index gets rid of this problem, that is WAY cheaper
than redesigning the code that generates the queries.

If there's any way of generating some replicable test scenarios that do
indeed exercise the above form of slow query, then it would be great to d=
o
some testing with some extra indices to see which make a difference.
Hannu Krosing

2005-09-02, 3:28 am

On N, 2005-09-01 at 21:48 -0400, cbbrowne-swQf4SbcV9C7WVzo/KQ3Mw@public.gmane.org wrote:

> I suppose that in version 8.1, there would be merit to putting an index
> each on log_xid and log_actionseq, bitmap indexing then doing some good.
>
> I find it unremarkable that people find some value in adding some
> additional indices. The set of indexes has always looked pretty
> minimalist to me.
>
> Methinks it warrants documenting the "possibles," and it's worth pointing
> out that if adding an index gets rid of this problem, that is WAY cheaper
> than redesigning the code that generates the queries.


Sure it's cheaper. Mostly solved the probleme here too. tnx.

The index condition still looks quite bogus (same thing OR'd 4 times,
once for each set), and it still has to do the sort on log_actionseq,
but it does already make a huge difference in performance :)

Sort (cost=12.90..12.91 rows=1 width=207)
Sort Key: log_actionseq
-> Index Scan using sl_log_1_idx2_hu, sl_log_1_idx2_hu,
sl_log_1_idx2_hu, sl_log_1_idx2_hu on sl_log_1 (cost=0.00..12.89 rows=1
width=207)
Index Cond: (
((log_xid < '1349053093') AND (log_xid >= '1349052761'))
OR ((log_xid < '1349053093') AND (log_xid >= '1349052761'))
OR ((log_xid < '1349053093') AND (log_xid >= '1349052761'))
OR ((log_xid < '1349053093') AND (log_xid >= '1349052761'))
)

> If there's any way of generating some replicable test scenarios that do
> indeed exercise the above form of slow query, then it would be great to do
> some testing with some extra indices to see which make a difference.


A simple test scenario is creating 2 sets on the same origin node
subscribing them to same subscriber node, subscribing one of the sets to
a 3rd node and then not running slon on that 3rd mnode for some time to
make sl_log_1 big enough to make bad plans to take long enought to be
visible;

--
Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org>
Michael Crozier

2005-09-02, 8:25 pm


> If there's any way of generating some replicable test scenarios that do
> indeed exercise the above form of slow query, then it would be great to do
> some testing with some extra indices to see which make a difference.


Would the optimizer even use tableid in an index for a query that uses
"tableid in (many tableid's)"? Given that sets often have dozens of tables,
I doubt it.

Especially on my slower machine, combining the two "set parts" of the query
produce the best results**. By combining the tableid filter statements to
include the list of the tables from all subscribed sets, query time is
reduced about 60%. I think this indicates that a significant amount of time
(probably CPU time) is spent filtering the rows for tableid's. This would
also explain why the problem is greatest on my old Sun 420, but lessened on a
new Opteron, when the two have similar disk configurations.



** When I say combining the "set parts", mean what was described earlier in
this thread, manually reducing the common substatements of the query in ways
that the optimizer doesn't seem able to:

Instead of this:
====================
====================
================
select [columns] from .sl_log_1
where
log_origin = <origin>
and
(
( log_tableid in ( [tableid's from set 1] and <xid filter> )
or
( log_tableid in ( [tableid's from set 2] and <xid filter> )
)
) order by log_actionseq;

This:
====================
====================
================
select [columns] from .sl_log_1
where
log_origin = <origin>
and log_tableid in [tableid's from sets 1 & 2]
and <xid filter>
order by log_actionseq;
Christopher Browne

2005-09-02, 8:25 pm

Michael Crozier wrote:

>
>Would the optimizer even use tableid in an index for a query that uses
>"tableid in (many tableid's)"? Given that sets often have dozens of tables,
>I doubt it.
>
>

I agree, it seems unlikely that the table ID column would be of any use
in an index intended for improving search efficiency.

>Especially on my slower machine, combining the two "set parts" of the query
>produce the best results**. By combining the tableid filter statements to
>include the list of the tables from all subscribed sets, query time is
>reduced about 60%. I think this indicates that a significant amount of time
>(probably CPU time) is spent filtering the rows for tableid's. This would
>also explain why the problem is greatest on my old Sun 420, but lessened on a
>new Opteron, when the two have similar disk configurations.
>
>
>
>** When I say combining the "set parts", mean what was described earlier in
>this thread, manually reducing the common substatements of the query in ways
>that the optimizer doesn't seem able to:
>
>Instead of this:
> ====================
====================
================
> select [columns] from .sl_log_1
> where
> log_origin = <origin>
> and
> (
> ( log_tableid in ( [tableid's from set 1] and <xid filter> )
> or
> ( log_tableid in ( [tableid's from set 2] and <xid filter> )
> )
> ) order by log_actionseq;
>
>This:
> ====================
====================
================
> select [columns] from .sl_log_1
> where
> log_origin = <origin>
> and log_tableid in [tableid's from sets 1 & 2]
> and <xid filter>
> order by log_actionseq;
>
>

The thing is, the second query /isn't/ equivalent to the first one.

I expect that the results of the queries will be the same, but in order
for it to be appropriate to actually change the query, we need to verify
that the results are "provably" the same.

I'll have to have a chat with Jan about that; 'tis on my list...

But in any case, it seems to me that we get the Big Gain from adding in
an index on the XID column. If that gets us 80% of the improvement, it
may not be worth improving the query.
Michael Crozier

2005-09-02, 8:25 pm


=3D=3D=3D=3D=3D=3D=3
D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3
D[color=darkred]
d =A0<xid filter> )[color=darkred]
d =A0<xid filter> )[color=darkred]
=3D=3D=3D=3D=3D=3D=3
D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3
D[color=darkred]
>
> The thing is, the second query /isn't/ equivalent to the first one.


They aren't equivalent? Perhaps the I'm only seeing simplistic cases. =20

Assuming that the "xid filter" part of the query is the same:
Let
W =3D "origin =3D <originId>"
X =3D "log_tableid in setA"
Y =3D "log_tableid in setB"
Z =3D "xid/xid_snapshot filter"

The original query is: W and ( ( X and Z) or (Y and Z) )
Reduces to: W and (X or Y) and Z

As "X or Y" is "(log_tableid in setA) or (log_tableid in setB)"
This should be equivalent to "log_tableid in (setA union setB)"
Hannu Krosing

2005-09-04, 1:26 pm

On R, 2005-09-02 at 15:07 -0400, Christopher Browne wrote:
> Michael Crozier wrote:


> The thing is, the second query /isn't/ equivalent to the first one.


if both <xid filter> parts from the first query are the same, then
the queries are equivalent.

> I expect that the results of the queries will be the same, but in order
> for it to be appropriate to actually change the query, we need to verify
> that the results are "provably" the same.



((A and X) or (B and X)) ==> ((A or B) and X)

(A = log_tableid in ([set1])) and (B = log_tableid in ([set2]))
==> (A or B) = log_tableid in ([set1] + [set2])

what more proof do you need ?

> I'll have to have a chat with Jan about that; 'tis on my list...
>
> But in any case, it seems to me that we get the Big Gain from adding in
> an index on the XID column. If that gets us 80% of the improvement, it
> may not be worth improving the query.


I guess that in my case the improvement was more like 95%.

Maybe it's not worth rewriting it to collapse the OR's, as for only a
few hundred rows, the difference will be only in sort, vs sorted-by-
index.

I guess that using some way to avoid long list of or's on tableid (maybe
by using temp tables, or just join on sl_table to hopefully get a hash
lookup), could cut down even more on CPU usage.

--
Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org>
Michael Crozier

2005-09-05, 8:25 pm


> ((A and X) or (B =A0and X)) =3D=3D> ((A or B) and X)
>
> (A =3D log_tableid in ([set1])) and (B =3D log_tableid in ([set2]))
> =A0 =A0=3D=3D> (A or B) =3D log_tableid in ([set1] + [set2])
>
> what more proof do you need ?


After looking at the code (though I am not very familar with slony intern=
als=20
yet), it seems that the <xid filter> parts are calculated per-set, so it=20
would have collapsed IF the xid parts were the same or it would require p=
roof=20
that the xid filter parts will always be the same.

> I guess that in my case the improvement was more like 95%.


I also saw a significant increase. The difference between my "fast" mach=
ine=20
and "slow" machine seemed to indicate that the difference was primarily C=
PU=20
related.

I will also note that during the "catch up" period after a long copy, the=
=20
event group size heuristic was not working very well, even when adjusting=
the=20
desired sync time to various values. The size of the group was affecting=
the=20
event processing time very little, so the group size would gradually reac=
h=20
100, then drop back down to 1. I eventually modified remote_worker.c to =
keep=20
the group size at 100, which decreased the catch up time by approximately=
90%=20
and kept the sync length below the 100000 (artificial?) limitation.

I'm not familiar enough with the algorithm to suggest an improvement, but=
I=20
can say that the immediate "back off down to 1" was the performance kille=
r.


--=20

Conducive Technology Corporation
Taking air cargo information=20
to the next level

http://www.conducivetech.com
http://www.pathfinder-web.com

Michael Crozier crozierm- 19LDBNnCZmbFzinHIz5S
+QC/G2K4zDHf@public.gmane.org

Voice: 503.445.4233
Fax: 503.274.0939
Christopher Browne

2005-09-05, 8:25 pm

Michael Crozier <crozierm- 19LDBNnCZmbFzinHIz5S
+QC/G2K4zDHf@public.gmane.org> writes:
>
> After looking at the code (though I am not very familar with slony intern=

als=20
> yet), it seems that the <xid filter> parts are calculated per-set, so it=

=20
> would have collapsed IF the xid parts were the same or it would require p=

roof=20
> that the xid filter parts will always be the same.


Right. That's why I'm not leaping at the change.

In any case, it appears that adding the index on log_xid is an
enormous improvement *without* making any code changes. If that gives
the "big win," then I'm _really_ disinclined to change the query.

>
> I also saw a significant increase. The difference between my "fast" mach=

ine=20
> and "slow" machine seemed to indicate that the difference was primarily C=

PU=20

> related.


That's consistent with it using a seq scan and having to sort the table.

> I will also note that during the "catch up" period after a long
> copy, the event group size heuristic was not working very well, even
> when adjusting the desired sync time to various values. The size of
> the group was affecting the event processing time very little, so
> the group size would gradually reach 100, then drop back down to 1.
> I eventually modified remote_worker.c to keep the group size at 100,
> which decreased the catch up time by approximately 90% and kept the
> sync length below the 100000 (artificial?) limitation.
>
> I'm not familiar enough with the algorithm to suggest an
> improvement, but I can say that the immediate "back off down to 1"
> was the performance killer.


Oh, dear.

Fortunately, you can deactivate that at runtime by setting the time
it's trying to aim at to 0.
--=20
"cbbrowne","@","ca.afilias.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)
Michael Crozier

2005-09-05, 8:25 pm


"[color=darkred]
>
> That's consistent with it using a seq scan and having to sort the table=

..

It was performing indexed lookups for both versions of the query. EXPLAI=
N=20
showed that the only difference was the tableid filter(s). It sounds odd,=
but=20
I'm fairly confident thats what I was seeing. Unfortunately I didn't kee=
p=20
notes and can't be completely certain.

The index and static group size were definitely the big improvment, thoug=
h.


> Oh, dear.
>
> Fortunately, you can deactivate that at runtime by setting the time
> it's trying to aim at to 0.


I misread the documentation, thanks for setting me straight.
Christopher Browne

2005-09-05, 8:25 pm

Michael Crozier <crozierm- 19LDBNnCZmbFzinHIz5S
+QC/G2K4zDHf@public.gmane.org> writes:

>
> It was performing indexed lookups for both versions of the query. EXPLAI=

N=20
> showed that the only difference was the tableid filter(s). It sounds odd,=

but=20
> I'm fairly confident thats what I was seeing. Unfortunately I didn't kee=

p=20
> notes and can't be completely certain.
>
> The index and static group size were definitely the big improvment, thoug=

h.

I think I missaid part of that.

It still needs to do a sort in order to put things in order of the
"action sequence". That will be an in-memory (in the PG backend)
sort.

>
> I misread the documentation, thanks for setting me straight.


If you can poke at why it seemed unclear, I'd be happy to improve the
docs in the hope of others not falling into the same trap.

If you think some things should be documented better, or that
something's confusing, by all means let me know. I may occasionally
suffer from "writer's arrogance," but improving the docs can help
prevent people from needing to ask "dumb questions" here, so I'd
rather see them improve :-).
--=20
let name=3D"cbbrowne" and tld=3D"ca.afilias.info" in String.concat "@" [nam=
e;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)
Hannu Krosing

2005-09-06, 7:24 am

On E, 2005-09-05 at 18:36 -0400, Christopher Browne wrote:
> Michael Crozier <crozierm- 19LDBNnCZmbFzinHIz5S
+QC/G2K4zDHf@public.gmane.org> writes:
>
> Right. That's why I'm not leaping at the change.


My plan was to modify the query to have a group by and some kind on
list-making aggregate for set ids.

But It looks like pg 8.0 does the optimisation by itself (collapses <xid
filter>s and makes a big list of ORs of all the log_tableid in(...)
clauses - essentially doing what I described - so the problem may go
away by just sitting on it :) )

> In any case, it appears that adding the index on log_xid is an
> enormous improvement *without* making any code changes. If that gives
> the "big win," then I'm _really_ disinclined to change the query.


If using a join to sl_table could change a long list of OR's to im-
memory hash lookup, that could be a further win on CPU usage. But only
testing will tell it.

--
Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org>
Hannu Krosing

2005-09-06, 7:24 am

On E, 2005-09-05 at 19:22 -0400, Christopher Browne wrote:
> Michael Crozier <crozierm- 19LDBNnCZmbFzinHIz5S
+QC/G2K4zDHf@public.gmane.org> writes:
>

Original query (without index on XID column) did the most inefficient
thing possible - an index scan over all rows, doing index qual on
constant first column .
[color=darkred]
>
> I think I missaid part of that.
>
> It still needs to do a sort in order to put things in order of the
> "action sequence". That will be an in-memory (in the PG backend)
> sort.


Any idea why we have "action sequence" in the default index at all ? It
seems that it can't ever be used ?

--
Hannu Krosing <hannu-7C/ iILuz2RdeoWH0uzbU5w@
public.gmane.org>
Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com