|
Home > Archive > PostgreSQL Hacks > February 2006 > Re: [PERFORM] temporary indexes
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Re: [PERFORM] temporary indexes
|
|
| Tom Lane 2006-02-28, 8:32 pm |
| "Jim C. Nasby" <jnasby@pervasive.com> writes:
> FWIW, Sybase supported something similar a long time ago. It had the
> ability to build a temporary 'clustered table' (think index organized
> table) when there was enough benefit to do so. This is actually
> much easier to make happen inside a transaction for us, because we don't
> need to keep visibility information around. There's probably also some
> index metadata that could be done away with. Perhaps the materialize
> node could be made to allow this.
How does what you describe differ from a merge join? Or a hash join,
if you imagine the temp index as being a hash rather than btree index?
The issue at hand really has nothing to do with temp indexes, it's with
the constrained way that the planner deals with EXISTS subplans. The
subplans themselves are cheap enough, even in the poorly-indexed
variant, that the planner would certainly never have decided to create
an index to use for them. The problem only becomes apparent at the next
level up, where those subplans are going to be repeated a huge number of
times ---- but the subplan plan is already chosen and won't be changed.
So even if we invented a temp-index facility, it would fail to be
applied in Kevin's example. The limiting factor is that EXISTS subplans
aren't flattened ... and once that's fixed, I doubt the example would
need any new kind of join support.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
| |
| Kevin Grittner 2006-02-28, 8:32 pm |
| >>> On Tue, Feb 28, 2006 at 11:05 am, in message
<16076.1141146348@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> The issue at hand really has nothing to do with temp indexes, it's
with
> the constrained way that the planner deals with EXISTS subplans.
Yet when the index exists, the query is optimized well.
> The
> subplans themselves are cheap enough, even in the poorly- indexed
> variant, that the planner would certainly never have decided to
create
> an index to use for them.
That depends. If the planner was able to generate hypothetical index
descriptions which might be useful, and analyze everything based on
those (adding in creation cost, of course) -- why would it not be able
to come up with the plan which it DID use when the index existed.
> The limiting factor is that EXISTS subplans
> aren't flattened ... and once that's fixed, I doubt the example
would
> need any new kind of join support.
<digression>
I'm all for that. So far, we've been going after the low-hanging fruit
in our use of PostgreSQL. When we get to the main applications, we're
going to be dealing with a lot more in the way of EXISTS clauses. The
product we're moving from usually optimized an IN test the same as the
logically equivalent EXISTS test, and where a difference existed, it
almost always did better with the EXISTS -- so we encouraged application
programmers to use that form. Also, EXISTS works in situations where
you need to compare on multiple columns, so it is useful in many
situations where EXISTS or MIN/MAX techniques just don't work.
</digression>
If fixing this would allow hash or merge techniques to cover this as
well as the index did, and that is true in a more general sense (not
just for this one example), then temporary indexes would clearly not
have any value.
-Kevin
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
| |
| Kevin Grittner 2006-02-28, 8:32 pm |
| >>> On Tue, Feb 28, 2006 at 11:36 am, in message
<440435BC.EE98.0025.0@wicourts.gov>, "Kevin Grittner"
> Also, EXISTS works in situations where
> you need to compare on multiple columns, so it is useful in many
> situations where EXISTS or MIN/MAX techniques just don't work.
Sorry. That should have read:
EXISTS works in situations where
you need to compare on multiple columns, so it is useful in many
situations where IN or MIN/MAX techniques just don't work.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
| |
| Kevin Grittner 2006-02-28, 8:32 pm |
| >>> On Tue, Feb 28, 2006 at 12:06 pm, in message
<16797.1141150016@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> IN works fine on multiple columns:
>
> (foo, bar, baz) IN (SELECT x, y, z FROM ...)
Thanks for pointing that out. I recognize it as valid ANSI/ISO syntax,
using a row value constructor list. Unfortunately, row value
constructor lists failed to make the portability cut for allowed syntax
here, because it was not supported by all candidate products at the
time. It is still not supported by the product we're moving from, and
we'll have about a one year window when our code will need to run on
both.
-Kevin
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
| |
| Jim C. Nasby 2006-02-28, 8:32 pm |
| On Tue, Feb 28, 2006 at 11:36:28AM -0600, Kevin Grittner wrote:
> <digression>
> I'm all for that. So far, we've been going after the low-hanging fruit
> in our use of PostgreSQL. When we get to the main applications, we're
> going to be dealing with a lot more in the way of EXISTS clauses. The
> product we're moving from usually optimized an IN test the same as the
> logically equivalent EXISTS test, and where a difference existed, it
> almost always did better with the EXISTS -- so we encouraged application
> programmers to use that form. Also, EXISTS works in situations where
> you need to compare on multiple columns, so it is useful in many
> situations where EXISTS or MIN/MAX techniques just don't work.
> </digression>
Maybe it's just the way my twisted mind thinks, but I generally prefer
using a JOIN when possible...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql
.org so that your
message can get through to the mailing list cleanly
| |
| Kevin Grittner 2006-02-28, 8:32 pm |
| >>> On Tue, Feb 28, 2006 at 11:05 am, in message
<16076.1141146348@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The limiting factor is that EXISTS subplans
> aren't flattened ... and once that's fixed, I doubt the example
would
> need any new kind of join support.
I rewrote the query to use IN predicates rather than EXISTS predicates,
and the cost estimates look like this:
EXISTS, no index: 1.6 billion
EXISTS, with index: 0.023 billion
IN, no index: 13.7 billion
IN, with index: 10.6 billion
At least for the two EXISTS cases, the estimates were roughly accurate.
These plans were run against the data after the fix, but analyze has
not been run since then, so the estimates should be comparable with the
earlier post.
I'm not used to using the IN construct this way, so maybe someone can
spot something horribly stupid in how I tried to use it.
-Kevin
| |
| Lukas Smith 2006-02-28, 8:32 pm |
| Kevin Grittner wrote:
> I rewrote the query to use IN predicates rather than EXISTS predicates,
> and the cost estimates look like this:
>
> EXISTS, no index: 1.6 billion
> EXISTS, with index: 0.023 billion
> IN, no index: 13.7 billion
> IN, with index: 10.6 billion
>
> At least for the two EXISTS cases, the estimates were roughly accurate.
> These plans were run against the data after the fix, but analyze has
> not been run since then, so the estimates should be comparable with the
> earlier post.
>
> I'm not used to using the IN construct this way, so maybe someone can
> spot something horribly stupid in how I tried to use it.
I will have a look at your queries tomorrow. Some general advice (rdbms
agnostic) on when to use IN and when to use EXISTS taken from "SQL
performance tuning":
- if the inner table has few rows and the outer has many then IN is
preferred
- if however you have a restrictive expression on the outer query you
should preferr EXISTS
- use NOT EXISTS instead of NOT IN (break out early)
regards,
Lukas
|
|
|
|
|