Home > Archive > PostgreSQL Performance > January 2006 > SELECT MIN, MAX took longer time than SELECT COUNT, MIN, MAX









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author SELECT MIN, MAX took longer time than SELECT COUNT, MIN, MAX
K C Lau

2006-01-20, 3:24 am


The following query took 17 seconds:
select count(LogSN), min(LogSN), max(LogSN) from Log where create_time <
'2005/10/19';

Figuring that getting the count will involve scanning the database, I took
it out, but the new query took 200 seconds:
select min(LogSN), max(LogSN) from Log where create_time < '2005/10/19';

Is it because the planner is using index pk_log instead of idx_logtime?
Anyway to avoid that?

I can get instant replies with 2 separate queries for min(LogSN) and
max(LogSN) using order by create_time limit 1, but I can't get both values
within 1 query using the limit 1 construct. Any suggestions?

I am running pg 8.1.2 on Windows 2000. The queries are done immediately
after a vacuum analyze.

Best regards,
KC.

----------------------

esdt=> \d log;
create_time | character varying(23) | default
'1970/01/01~00:00:00.000'::char
acter varying
logsn | integer | not null
...
Indexes:
"pk_log" PRIMARY KEY, btree (logsn)
"idx_logtime" btree (create_time, logsn)
...

esdt=> vacuum analyze log;
VACUUM

esdt=> explain analyze select count(LogSN), min(LogSN), max(LogSN) from Log
where create_time < '2005/10/19';

Aggregate (cost=57817.74..57817.75 rows=1 width=4) (actual
time=17403.381..17403.384 rows=1 loops=1)
-> Bitmap Heap Scan on log (cost=1458.31..57172.06 rows=86089
width=4) (actual time=180.368..17039.262 rows=106708 loops=1)
Recheck Cond: ((create_time)::text
< '2005/10/19'::text)
-> Bitmap Index Scan on idx_logtime (cost=0.00..1458.31
rows=86089 width=0) (actual time=168.777..168.777 rows=106708 loops=1)
Index Cond: ((create_time)::text
< '2005/10/19'::text)
Total runtime: 17403.787 ms

esdt=> explain analyze select min(LogSN), max(LogSN) from Log where
create_time < '2005/10/19';

Result (cost=2.51..2.52 rows=1 width=0) (actual
time=200051.507..200051.510 rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..1.26 rows=1 width=4) (actual
time=18.541..18.544 rows=1 loops=1)
-> Index Scan using pk_log on log (cost=0.00..108047.11
rows=86089
width=4) (actual time=18.533..18.533 rows=1 loops=1)
Filter: (((create_time)::tex
t < '2005/10/19'::text) AND
(logsn IS NOT NULL))
-> Limit (cost=0.00..1.26 rows=1 width=4) (actual
time=200032.928..200032.931 rows=1 loops=1)
-> Index Scan Backward using pk_log on
log (cost=0.00..108047.11 rows=86089 width=4) (actual
time=200032.920..200032.920 rows=1 loops=1)
Filter: (((create_time)::tex
t < '2005/10/19'::text) AND
(logsn IS NOT NULL))
Total runtime: 200051.701 ms

esdt=> explain analyze select LogSN from Log where create_time <
'2005/10/19' order by create_time limit 1;

Limit (cost=0.00..0.98 rows=1 width=31) (actual time=0.071..0.073 rows=1
loops=1)
-> Index Scan using idx_logtime on log (cost=0.00..84649.94
rows=86089 width=31) (actual time=0.063..0.063 rows=1 loops=1)
Index Cond: ((create_time)::text
< '2005/10/19'::text)
Total runtime: 0.182 ms

esdt=> explain analyze select LogSN from Log where create_time <
'2005/10/19' order by create_time desc limit 1;
Limit (cost=0.00..0.98 rows=1 width=31) (actual time=0.058..0.061 rows=1
loops=1)
-> Index Scan Backward using idx_logtime on log (cost=0.00..84649.94
rows=86089 width=31) (actual time=0.051..0.051 rows=1 loops=1)
Index Cond: ((create_time)::text
< '2005/10/19'::text)
Total runtime: 0.186 ms


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Jim C. Nasby

2006-01-20, 11:24 am

On Fri, Jan 20, 2006 at 12:35:36PM +0800, K C Lau wrote:

Here's the problem... the estimate for the backwards index scan is *way*
off:

> -> Limit (cost=0.00..1.26 rows=1 width=4) (actual
> time=200032.928..200032.931 rows=1 loops=1)
> -> Index Scan Backward using pk_log on
> log (cost=0.00..108047.11 rows=86089 width=4) (actual
> time=200032.920..200032.920 rows=1 loops=1)
> Filter: (((create_time)::tex
t < '2005/10/19'::text) AND
> (logsn IS NOT NULL))
> Total runtime: 200051.701 ms


BTW, these queries below are meaningless; they are not equivalent to
min(logsn).

> esdt=> explain analyze select LogSN from Log where create_time <
> '2005/10/19' order by create_time limit 1;
>
> Limit (cost=0.00..0.98 rows=1 width=31) (actual time=0.071..0.073 rows=1
> loops=1)
> -> Index Scan using idx_logtime on log (cost=0.00..84649.94
> rows=86089 width=31) (actual time=0.063..0.063 rows=1 loops=1)
> Index Cond: ((create_time)::text
< '2005/10/19'::text)
> Total runtime: 0.182 ms
>
> esdt=> explain analyze select LogSN from Log where create_time <
> '2005/10/19' order by create_time desc limit 1;
> Limit (cost=0.00..0.98 rows=1 width=31) (actual time=0.058..0.061 rows=1
> loops=1)
> -> Index Scan Backward using idx_logtime on log (cost=0.00..84649.94
> rows=86089 width=31) (actual time=0.051..0.051 rows=1 loops=1)
> Index Cond: ((create_time)::text
< '2005/10/19'::text)
> Total runtime: 0.186 ms
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Tom Lane

2006-01-20, 1:24 pm

"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Fri, Jan 20, 2006 at 12:35:36PM +0800, K C Lau wrote:
> Here's the problem... the estimate for the backwards index scan is *way*
> off:

[color=darkred]

It's more subtle than you think. The estimated rowcount is the
estimated number of rows fetched if the indexscan were run to
completion, which it isn't because the LIMIT cuts it off after the
first returned row. That estimate is not bad (we can see from the
aggregate plan that the true value would have been 106708, assuming
that the "logsn IS NOT NULL" condition isn't filtering anything).

The real problem is that it's taking quite a long time for the scan
to reach the first row with create_time < 2005/10/19, which is not
too surprising if logsn is strongly correlated with create_time ...
but in the absence of any cross-column statistics the planner has
no very good way to know that. (Hm ... but both of them probably
also show a strong correlation to physical order ... we could look
at that maybe ...) The default assumption is that the two columns
aren't correlated and so it should not take long to hit the first such
row, which is why the planner likes the indexscan/limit plan.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com