|
Home > Archive > PostgreSQL Performance > March 2006 > count(*) performance
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
count(*) performance
|
|
| Gábriel Ákos 2006-03-27, 9:40 am |
| Hi,
I guess this is an age-old 100times answered question, but I didn't find
the answer to it yet (neither in the FAQ nor in the mailing list archives).
Question: I have a table with 2.5M rows. count(*) on this table is
running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10
array (sata, not scsi)) Is this normal? How could I make it run faster?
Maybe make it run faster for the 2nd time? Which parameters should I
change in postgresql.conf and how?
--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :akos.gabriel@i-logic.hu|Web: http://www.i-logic.hu=-
-=Tel/fax:+3612367353 |Mobil:+36209278894 =-
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
| |
| Luke Lonergan 2006-03-27, 11:33 am |
| Gabriel,
On 3/27/06 5:34 AM, "Gábriel Ákos" <akos.gabriel@i-logic.hu> wrote:
> Question: I have a table with 2.5M rows. count(*) on this table is
> running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10
> array (sata, not scsi)) Is this normal? How could I make it run faster?
> Maybe make it run faster for the 2nd time? Which parameters should I
> change in postgresql.conf and how?
Before changing anything with your Postgres configuration, you should check
your hard drive array performance. All select count(*) does is a sequential
scan of your data, and if the table is larger than memory, or if it's the
first time you've scanned it, it is limited by your disk speed.
To test your disk speed, use the following commands and report the times
here:
time bash -c "dd if=/dev/zero of=bigfile bs=8k count=500000 && sync"
time dd if=bigfile of=/dev/null bs=8k
If these are taking a long time, from another session watch the I/O rate
with "vmstat 1" for a while and report that here.
- Luke
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
| |
| Gábriel Ákos 2006-03-27, 1:37 pm |
| Luke Lonergan wrote:
> To test your disk speed, use the following commands and report the times
> here:
>
> time bash -c "dd if=/dev/zero of=bigfile bs=8k count=500000 && sync"
root@panther:/fast # time bash -c "dd if=/dev/zero of=bigfile bs=8k
count=500000 && sync"
500000+0 records in
500000+0 records out
4096000000 bytes transferred in 45.469404 seconds (90082553 bytes/sec)
real 0m56.880s
user 0m0.112s
sys 0m18.937s
> time dd if=bigfile of=/dev/null bs=8k
root@panther:/fast # time dd if=bigfile of=/dev/null bs=8k
500000+0 records in
500000+0 records out
4096000000 bytes transferred in 53.542147 seconds (76500481 bytes/sec)
real 0m53.544s
user 0m0.048s
sys 0m10.637s
I guess these values aren't that bad :)
--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :akos.gabriel@i-logic.hu|Web: http://www.i-logic.hu=-
-=Tel/fax:+3612367353 |Mobil:+36209278894 =-
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
| |
| Mikael Carneholm 2006-03-27, 8:28 pm |
| This is where a "last_vacuumed" (and "last_analyzed") column in
pg_statistic(?) would come in handy. Each time vacuum or analyze has
finished, update the row for the specific table that was
vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed
column. No more guessing "maybe I haven't vacuumed/analyzed in a while",
and each time a user complains about bad performance, one could request
the user to do a "select s.last_vacuumed, s.last_analyzed from
pg_statistic s, pg_attribute a, pg_class c where ..."
It SOUNDS easy to implement, but that has fooled me before... :-)
- Mikael
-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Guido
Neitzer
Sent: den 27 mars 2006 21:44
To: Brendan Duddridge
Cc: Postgresql Performance
Subject: Re: [PERFORM] count(*) performance
On 27.03.2006, at 21:20 Uhr, Brendan Duddridge wrote:
> Does that mean that even though autovacuum is turned on, you still
> should do a regular vacuum analyze periodically?
It seems that there are situations where autovacuum does not a really
good job.
However, in our application I have made stupid design decision which
I want to change as soon as possible. I have a "visit count" column
in one of the very large tables, so updates are VERY regular. I've
just checked and saw that autovacuum does a great job with that.
Nevertheless I have set up a cron job to do a standard vacuum every
month. I've used vacuum full only once after I did a bulk update of
about 200.000 rows ...
cug
--
PharmaLine, Essen, GERMANY
Software and Database Development
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
| |
| Matthew T. O'Connor 2006-03-27, 8:28 pm |
| Mikael Carneholm wrote:
> This is where a "last_vacuumed" (and "last_analyzed") column in
> pg_statistic(?) would come in handy. Each time vacuum or analyze has
> finished, update the row for the specific table that was
> vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed
> column. No more guessing "maybe I haven't vacuumed/analyzed in a while",
> and each time a user complains about bad performance, one could request
> the user to do a "select s.last_vacuumed, s.last_analyzed from
> pg_statistic s, pg_attribute a, pg_class c where ..."
>
> It SOUNDS easy to implement, but that has fooled me before... :-)
It is fairly easy to implement, however it has been discussed before and
decided that it wasn't necessary. What the system cares about is how
long it's been since the last vacuum in terms of XIDs not time. Storing
a timestamp would make it more human readable, but I'm not sure the
powers that be want to add two new columns to some system table to
accommodate this.
Matt
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
| |
| Tom Lane 2006-03-27, 8:28 pm |
| "Matthew T. O'Connor" <matthew@zeut.net> writes:
> It is fairly easy to implement, however it has been discussed before and
> decided that it wasn't necessary. What the system cares about is how
> long it's been since the last vacuum in terms of XIDs not time.
I think Alvaro is intending to do the latter (store per-table vacuum xid
info) for 8.2.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
| |
| Mikael Carneholm 2006-03-27, 8:28 pm |
| I think it is definitely necessary from an administration point of view - as an administrator, I want to know:
1) Are there any stats (at all) in a schema
2) Are there any stats on the table that slow_query_foo is targeting
3) If I have stats, how recent are they
4) Could it be that there are a lot of dead tuples lying around (given the amount of traffic I know I have)
These would be (are always!) the first questions I ask myself when I'm about to identify performance problems in an app, don't know how other people do though :)
Maybe something I'll try to look into this weekend, if I can spare some time.
- Mikael
-----Original Message-----
From: Matthew T. O'Connor & #91;mailto:matthew@z
eut.net]
Sent: den 28 mars 2006 00:43
To: Mikael Carneholm
Cc: Postgresql Performance
Subject: Re: [PERFORM] count(*) performance
Mikael Carneholm wrote:
> This is where a "last_vacuumed" (and "last_analyzed") column in
> pg_statistic(?) would come in handy. Each time vacuum or analyze has
> finished, update the row for the specific table that was
> vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed
> column. No more guessing "maybe I haven't vacuumed/analyzed in a while",
> and each time a user complains about bad performance, one could request
> the user to do a "select s.last_vacuumed, s.last_analyzed from
> pg_statistic s, pg_attribute a, pg_class c where ..."
>
> It SOUNDS easy to implement, but that has fooled me before... :-)
It is fairly easy to implement, however it has been discussed before and
decided that it wasn't necessary. What the system cares about is how
long it's been since the last vacuum in terms of XIDs not time. Storing
a timestamp would make it more human readable, but I'm not sure the
powers that be want to add two new columns to some system table to
accommodate this.
Matt
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
|
|
|
|
|