Home > Archive > PostgreSQL Performance > June 2005 > How to determine whether to VACUUM or CLUSTER









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author How to determine whether to VACUUM or CLUSTER
Ken Shaw

2005-06-19, 3:25 am

Hi All,

I have an app that updates a PostgreSQL db in a batch fashion. After
each batch (or several batches), it issues VACUUM and ANALYZE calls on
the updated tables. Now I want to cluster some tables for better
performance. I understand that doing a VACUUM and a CLUSTER on a table
is wasteful as the CLUSTER makes the VACUUM superfluous. The app does
not have a built-in list of the tables and whether each is clustered or
not. It looks to me as if the only way to determine whether to issue a
VACUUM (on a non-clustered table) or a CLUSTER (on a clustered table) is
to query the table "pg_index", much like view "pg_indexes" does, for the
column "indisclustered". Is this right?

Also, how expensive is CLUSTER compared to VACUUM? Does CLUSTER read in
the whole table, sort it, and write it back out? Or write out a
completely new file? Is the time for a CLUSTER the same whether one row
is out of place or the table is completely disordered?

Thanks,
Ken



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql
.org so that your
message can get through to the mailing list cleanly

Jim C. Nasby

2005-06-20, 3:23 am

On Wed, Jun 15, 2005 at 11:34:18AM -0400, Ken Shaw wrote:
> Hi All,
>
> I have an app that updates a PostgreSQL db in a batch fashion. After
> each batch (or several batches), it issues VACUUM and ANALYZE calls on
> the updated tables. Now I want to cluster some tables for better
> performance. I understand that doing a VACUUM and a CLUSTER on a table
> is wasteful as the CLUSTER makes the VACUUM superfluous. The app does
> not have a built-in list of the tables and whether each is clustered or
> not. It looks to me as if the only way to determine whether to issue a
> VACUUM (on a non-clustered table) or a CLUSTER (on a clustered table) is
> to query the table "pg_index", much like view "pg_indexes" does, for the
> column "indisclustered". Is this right?


I don't think that's what you want. 'indisclustered' only indicates if
the last time the table was clustered was on that index. The best thing
that comes to mind is looking at the correlation of the first field in
the index for the table. You'll find this info in pg_stats.

> Also, how expensive is CLUSTER compared to VACUUM? Does CLUSTER read in
> the whole table, sort it, and write it back out? Or write out a
> completely new file? Is the time for a CLUSTER the same whether one row
> is out of place or the table is completely disordered?


AFAIK, cluster completely re-creates the table from scratch, then
rebuilds all the indexes. It's basically the most expensive operation
you can perform on a table. There probably will be some increased
performance from the sort if the table is already mostly in the right
order though.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com