Home > Archive > PostgreSQL Performance > March 2006 > Re: Scaling up PostgreSQL in Multiple CPU / Dual Core









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: Scaling up PostgreSQL in Multiple CPU / Dual Core
Scott Marlowe

2006-03-23, 1:33 pm

On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
>
> I believe that is limited to Bizgress MPP yes?


Yep. I hope that someday it will be released to the postgresql global
dev group for inclusion. Or at least parts of it.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Christopher Browne

2006-03-23, 8:40 pm

Martha Stewart called it a Good Thing when smarlowe@g2switchwor
ks.com (Scott Marlowe) wrote:
> On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
>
> Yep. I hope that someday it will be released to the postgresql global
> dev group for inclusion. Or at least parts of it.


Question: Does the Bizgress/MPP use threading for this concurrency?
Or forking?

If it does so via forking, that's more portable, and less dependent on
specific complexities of threading implementations (which amounts to
non-portability ;-)).

Most times Jan comes to town, we spend a few minutes musing about the
"splitting queries across threads" problem, and dismiss it again; if
there's the beginning of a "split across processes," that's decidedly
neat :-).
--
output = reverse("moc.liamg" "@" "enworbbc")
http://linuxfinances.info/info/internet.html
Why do we put suits in a garment bag, and put garments in a suitcase?
Jim C. Nasby

2006-03-24, 7:43 am

On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote:
> Martha Stewart called it a Good Thing when smarlowe@g2switchwor
ks.com (Scott Marlowe) wrote:
>
> Question: Does the Bizgress/MPP use threading for this concurrency?
> Or forking?
>
> If it does so via forking, that's more portable, and less dependent on
> specific complexities of threading implementations (which amounts to
> non-portability ;-)).
>
> Most times Jan comes to town, we spend a few minutes musing about the
> "splitting queries across threads" problem, and dismiss it again; if
> there's the beginning of a "split across processes," that's decidedly
> neat :-).


Correct me if I'm wrong, but there's no way to (reasonably) accomplish
that without having some dedicated extra processes laying around that
you can use to execute the queries, no? In other words, the cost of a
fork() during query execution would be too prohibitive...

FWIW, DB2 executes all queries in a dedicated set of processes. The
process handling the connection from the client will pass a query
request off to one of the executor processes. I can't remember which
process actually plans the query, but I know that the executor runs it.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Luke Lonergan

2006-03-24, 9:32 am

Christopher,

On 3/23/06 6:22 PM, "Christopher Browne" <cbbrowne@acm.org> wrote:

> Question: Does the Bizgress/MPP use threading for this concurrency?
> Or forking?
>
> If it does so via forking, that's more portable, and less dependent on
> specific complexities of threading implementations (which amounts to
> non-portability ;-)).


OK - I'll byte:

It's process based, we fork backends at slice points in the execution plan.

To take care of the startup latency problem, we persist sets of these
backends, called "gangs". They appear, persist for connection scope for
reuse, then are disbanded.

> Most times Jan comes to town, we spend a few minutes musing about the
> "splitting queries across threads" problem, and dismiss it again; if
> there's the beginning of a "split across processes," that's decidedly
> neat :-).


:-)

- Luke



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Chris Browne

2006-03-24, 8:27 pm

jnasby@pervasive.com ("Jim C. Nasby") writes:
> On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote:
>
> Correct me if I'm wrong, but there's no way to (reasonably) accomplish
> that without having some dedicated extra processes laying around that
> you can use to execute the queries, no? In other words, the cost of a
> fork() during query execution would be too prohibitive...


Counterexample...

The sort of scenario we keep musing about is where you split off a
(thread|process) for each partition of a big table. There is in fact
a natural such partitioning, in that tables get split at the 1GB mark,
by default.

Consider doing a join against 2 tables that are each 8GB in size
(e.g. - they consist of 8 data files). Let's assume that the query
plan indicates doing seq scans on both.

You *know* you'll be reading through 16 files, each 1GB in size.
Spawning a process for each of those files doesn't strike me as
"prohibitively expensive."

A naive read on this is that you might start with one backend process,
which then spawns 16 more. Each of those backends is scanning through
one of those 16 files; they then throw relevant tuples into shared
memory to be aggregated/joined by the central one.

That particular scenario is one where the fork()s would hardly be
noticeable.

> FWIW, DB2 executes all queries in a dedicated set of processes. The
> process handling the connection from the client will pass a query
> request off to one of the executor processes. I can't remember which
> process actually plans the query, but I know that the executor runs
> it.


It seems to me that the kinds of cases where extra processes/threads
would be warranted are quite likely to be cases where fork()ing may be
an immaterial cost.
--
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/languages.html
TECO Madness: a moment of convenience, a lifetime of regret.
-- Dave Moon
Chris Browne

2006-03-24, 8:27 pm

llonergan@greenplum.com ("Luke Lonergan") writes:
> Christopher,
>
> On 3/23/06 6:22 PM, "Christopher Browne" <cbbrowne@acm.org> wrote:
>
>
> OK - I'll byte:
>
> It's process based, we fork backends at slice points in the execution plan.


By "slice points", do you mean that you'd try to partition tables
(e.g. - if there's a Seq Scan on a table with 8 1GB segments, you
could spawn as many as 8 processes), or that two scans that are then
merge joined means a process for each scan, and a process for the
merge join? Or perhaps both :-). Or perhaps something else entirely ;-).

> To take care of the startup latency problem, we persist sets of
> these backends, called "gangs". They appear, persist for connection
> scope for reuse, then are disbanded.


If only that could happen to more gangs...
--
output = ("cbbrowne" "@" "cbbrowne.com")
http://cbbrowne.com/info/multiplexor.html
"I'm sorry, the teleportation booth you have reached is not in service
at this time. Please hand-reassemble your molecules or call an
operator to help you...."
Jim C. Nasby

2006-03-24, 8:27 pm

On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
>
> Counterexample...
>
> The sort of scenario we keep musing about is where you split off a
> (thread|process) for each partition of a big table. There is in fact
> a natural such partitioning, in that tables get split at the 1GB mark,
> by default.
>
> Consider doing a join against 2 tables that are each 8GB in size
> (e.g. - they consist of 8 data files). Let's assume that the query
> plan indicates doing seq scans on both.
>
> You *know* you'll be reading through 16 files, each 1GB in size.
> Spawning a process for each of those files doesn't strike me as
> "prohibitively expensive."


Have you ever tried reading from 2 large files on a disk at the same
time, let alone 16? The results ain't pretty.

What you're suggesting maybe makes sense if the two tables are in
different tablespaces, provided you have some additional means to know
if those two tablespaces are on the same set of spindles. Though even
here the usefulness is somewhat suspect, because CPU is a hell of a lot
faster than disks are, unless you have a whole lot of disks. Of course,
this is exactly the target market for MPP.

Where parallel execution really makes sense is when you're doing things
like sorts or hash operations, because those are relatively
CPU-intensive.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Chris Browne

2006-03-24, 8:27 pm

mstone+postgres@math
om.us (Michael Stone) writes:

> On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
>
> Of course, table scanning is going to be IO limited in most cases, and
> having every query spawn 16 independent IO threads is likely to slow
> things down in more cases than it speeds them up. It could work if you
> have a bunch of storage devices, but at that point it's probably
> easier and more direct to implement a clustered approach.


All stipulated, yes. It obviously wouldn't be terribly useful to scan
more aggressively than I/O bandwidth can support. The point is that
this is one of the kinds of places where concurrent processing could
do some good...
--
let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/spiritual.html
Save the whales. Collect the whole set.
Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com