Home > Archive > PostgreSQL Performance > August 2005 > Re: Read/Write block sizes (Was: Caching by Postgres)









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: Read/Write block sizes (Was: Caching by Postgres)
Jignesh Shah

2005-08-23, 8:25 pm

> Does that include increasing the size of read/write blocks? I've
> noticedthat with a large enough table it takes a while to do a
> sequential scan,
> even if it's cached; I wonder if the fact that it takes a million
> read(2) calls to get through an 8G table is part of that.
>



Actually some of that readaheads,etc the OS does already if it does some sort of throttling/clubbing of reads/writes. But its not enough for such types of workloads.

Here is what I think will help:

* Support for different Blocksize TABLESPACE without recompiling the code.. (Atlease support for a different Blocksize for the whole database without recompiling the code)

* Support for bigger sizes of WAL files instead of 16MB files WITHOUT recompiling the code.. Should be a tuneable if you ask me (with checkpoint_segments at 256.. you have too many 16MB files in the log directory) (This will help OLTP benchmarks more sinc
e now they don't spend time rotating log files)

* Introduce a multiblock or extent tunable variable where you can define a multiple of 8K (or BlockSize tuneable) to read a bigger chunk and store it in the bufferpool.. (Maybe writes too) (Most devices now support upto 1MB chunks for reads and writes)

*There should be a way to preallocate files for TABLES in TABLESPACES otherwise with multiple table writes in the same filesystem ends with fragmented files which causes poor "READS" from the files.

* With 64bit 1GB file chunks is also moot.. Maybe it should be tuneable too like 100GB without recompiling the code.


Why recompiling is bad? Most companies that will support Postgres will support their own binaries and they won't prefer different versions of binaries for different blocksizes, different WAL file sizes, etc... and hence more function using the same set of
binaries is more desirable in enterprise environments


Regards,
Jignesh



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Michael Stone

2005-08-23, 8:25 pm

On Tue, Aug 23, 2005 at 05:29:01PM -0400, Jignesh Shah wrote:
>Actually some of that readaheads,etc the OS does already if it does
>some sort of throttling/clubbing of reads/writes.


Note that I specified the fully cached case--even with the workload in
RAM the system still has to process a heck of a lot of read calls.

>* Introduce a multiblock or extent tunable variable where you can
>define a multiple of 8K (or BlockSize tuneable) to read a bigger chunk
>and store it in the bufferpool.. (Maybe writes too) (Most devices now
>support upto 1MB chunks for reads and writes)


Yeah. The problem with relying on OS readahead is that the OS doesn't
know whether you're doing a sequential scan or an index scan; if you
have the OS agressively readahead you'll kill your seek performance.
OTOH, if you don't do readaheads you'll kill your sequential scan
performance. At the app level you know which makes sense for each
operation.

Mike Stone

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Jeffrey W. Baker

2005-08-23, 8:25 pm

On Tue, 2005-08-23 at 19:12 -0400, Michael Stone wrote:
> On Tue, Aug 23, 2005 at 05:29:01PM -0400, Jignesh Shah wrote:
>
> Note that I specified the fully cached case--even with the workload in
> RAM the system still has to process a heck of a lot of read calls.
>
>
> Yeah. The problem with relying on OS readahead is that the OS doesn't
> know whether you're doing a sequential scan or an index scan; if you
> have the OS agressively readahead you'll kill your seek performance.
> OTOH, if you don't do readaheads you'll kill your sequential scan
> performance. At the app level you know which makes sense for each
> operation.


This is why we have MADVISE_RANDOM and MADVISE_SEQUENTIAL.

-jwb

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com