Home > Archive > PostgreSQL Discussion > April 2005 > Optimising Union Query.









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Optimising Union Query.
Rob Kirkbride

2005-04-22, 9:23 am

Hi,

I've got a query that takes quite some time to complete. I'm not an SQL
expert so I'm not sure how to improve things.
I've done a explain analyze and as I expected the database has to check
every row in each of the three tables below but I'm wondering if I can
do it much quicker by a use of an index or something. Each of the three
tables could have several thousand entries in. Basically the tables
contain data recorded against time then every hour a script deletes
entries that more than so many hours old.

select l.name,l.id from pa i,locations l where i.location=l.id union
select l.name,l.id from andu i,locations l where i.location=l.id union
select l.name,l.id from idu i,locations l where i.location=l.id;

Thanks for any help,

Rob



---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Dawid Kuroczko

2005-04-22, 9:23 am

On 4/22/05, Rob Kirkbride <rob.kirkbride@thales-is.com> wrote:
> I've got a query that takes quite some time to complete. I'm not an SQL
> expert so I'm not sure how to improve things.
> I've done a explain analyze and as I expected the database has to check
> every row in each of the three tables below but I'm wondering if I can
> do it much quicker by a use of an index or something. Each of the three
> tables could have several thousand entries in. Basically the tables
> contain data recorded against time then every hour a script deletes
> entries that more than so many hours old.
>
> select l.name,l.id from pa i,locations l where i.location=l.id union
> select l.name,l.id from andu i,locations l where i.location=l.id union
> select l.name,l.id from idu i,locations l where i.location=l.id;


Would it be OK if there were duplicates returned? I.e if select from pa
table and andu table returned same row, would it be ok if there would
be two rows in "final" output because of one?

If so, change "union" to "UNION ALL".

If you put only "UNION", server gets resutls from _all_ selects, removes
duplicates and returns your query. If you put "UNION ALL" it simply
does three selects and returns all the rows returned. Not having to
look for duplicates makes it WAY faster.

Regards,
Dawid

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Patrick TJ McPhee

2005-04-23, 8:23 pm

In article <4268F322.1040106@thales-is.com>,
Rob Kirkbride <rob.kirkbride@thales-is.com> wrote:

% I've done a explain analyze and as I expected the database has to check
% every row in each of the three tables below but I'm wondering if I can

This is because you're returning a row for every row in the three
tables.

% select l.name,l.id from pa i,locations l where i.location=l.id union
% select l.name,l.id from andu i,locations l where i.location=l.id union
% select l.name,l.id from idu i,locations l where i.location=l.id;

You might get some improvement from

select name,id from locations
where id in (select distinct location from pa union
select distinct location from andu union
select distinct location from idu);

this query might be helped by an index on location in each of those
three tables, but probably not.
--

Patrick TJ McPhee
North York Canada
ptjm@interlog.com
Jim C. Nasby

2005-04-24, 8:23 pm

On Sat, Apr 23, 2005 at 10:39:14PM +0000, Patrick TJ McPhee wrote:
> In article <4268F322.1040106@thales-is.com>,
> Rob Kirkbride <rob.kirkbride@thales-is.com> wrote:
>
> % I've done a explain analyze and as I expected the database has to check
> % every row in each of the three tables below but I'm wondering if I can
>
> This is because you're returning a row for every row in the three
> tables.
>
> % select l.name,l.id from pa i,locations l where i.location=l.id union
> % select l.name,l.id from andu i,locations l where i.location=l.id union
> % select l.name,l.id from idu i,locations l where i.location=l.id;
>
> You might get some improvement from
>
> select name,id from locations
> where id in (select distinct location from pa union
> select distinct location from andu union
> select distinct location from idu);


Note that SELECT DISTINCT is redundant with a plain UNION. By
definition, UNION does a DISTINCT. In fact, this is going to hurt you;
you'll end up doing 4 distinct operations (one for each SELECT DISTINCT
and one for the overall UNION). Unless some of those tables have a lot
of duplicated location values, you should either use UNION ALLs or drop
the DISTINCTs. Note that going with DISTINCTs is different than what
your original query does.

You should also consider this:

SELECT name, id FROM locations l
WHERE EXISTS (SELECT * FROM pa p WHERE p.location=l.id)
OR EXISTS (SELECT * FROM andu a WHERE a.location=l.id)
OR EXISTS (SELECT * FROM idu i WHERE i.location=l.id)

This query would definately be helped by having indexes on
(pa|andu|idu).location.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Rob Kirkbride

2005-04-25, 9:24 am

Jim C. Nasby wrote on 25/04/2005 01:28:

>On Sat, Apr 23, 2005 at 10:39:14PM +0000, Patrick TJ McPhee wrote:
>
>
>
>Note that SELECT DISTINCT is redundant with a plain UNION. By
>definition, UNION does a DISTINCT. In fact, this is going to hurt you;
>you'll end up doing 4 distinct operations (one for each SELECT DISTINCT
>and one for the overall UNION). Unless some of those tables have a lot
>of duplicated location values, you should either use UNION ALLs or drop
>the DISTINCTs. Note that going with DISTINCTs is different than what
>your original query does.
>
>You should also consider this:
>
>SELECT name, id FROM locations l
> WHERE EXISTS (SELECT * FROM pa p WHERE p.location=l.id)
> OR EXISTS (SELECT * FROM andu a WHERE a.location=l.id)
> OR EXISTS (SELECT * FROM idu i WHERE i.location=l.id)
>
>This query would definately be helped by having indexes on
>(pa|andu|idu).location.
>
>


Thanks for that. I tried a few things, including using DISTINCTS and
UNION ALLs but none made a big difference.
However your query above sped things up by a factor of more than 2.

Thanks very much!

Rob


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com