Home > Archive > PostgreSQL Hacks > September 2005 > Re: [PERFORM] Releasing memory during External sorting?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: [PERFORM] Releasing memory during External sorting?
Dann Corbit

2005-09-23, 8:24 pm

For the subfiles, load the top element of each subfile into a priority
queue. Extract the min element and write it to disk. If the next value
is the same, then the queue does not need to be adjusted. If the next
value in the subfile changes, then adjust it.

Then, when the lowest element in the priority queue changes, adjust the
queue.

Keep doing that until the queue is empty.

You can create all the subfiles in one pass over the data.

You can read all the subfiles, merge them, and write them out in a
second pass (no matter how many of them there are).

Replacement selection is not a good idea any more, since obvious better
ideas should take over. Longer runs are of no value if you do not have
to do multiple merge passes.

I have explained this general technique in the book "C Unleashed",
chapter 13.

Sample code is available on the book's home page.

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Ron Peacetree
> Sent: Friday, September 23, 2005 11:41 AM
> To: Mark Lewis; Tom Lane; pgsql-hackers@postgresql.org; pgsql-
> performance@postgres
ql.org
> Subject: Re: [HACKERS] [PERFORM] Releasing memory during External

sorting?
>
> Yep. Also, bear in mind that the lg(n!)= ~ nlgn - n lower bound on
> the number of comparisions:
> a= says nothing about the amount of data movement used.
> b= only holds for generic comparison based sorting algorithms.
>
> As Knuth says (vol 3, p180), Distribution Counting sorts without
> ever comparing elements to each other at all, and so does Radix
> Sort. Similar comments can be found in many algorithms texts.
>
> Any time we know that the range of the data to be sorted is

substantially
> restricted compared to the number of items to be sorted, we can sort

in
> less than O(lg(n!)) time. DB fields tend to take on few values and

are
> therefore "substantially restricted".
>
> Given the proper resources and algorithms, O(n) sorts are very

plausible
> when sorting DB records.
>
> All of the fastest external sorts of the last decade or so take

advantage
> of
> this. Check out that URL I posted.
>
> Ron
>
>
> -----Original Message-----
> From: Mark Lewis <mark.lewis@mir3.com>
> Sent: Sep 23, 2005 1:43 PM
> To: Tom Lane <tgl@sss.pgh.pa.us>
> Subject: Re: [PERFORM] Releasing memory during External sorting?
>
> operations != passes. If you were clever, you could probably write a
> modified bubble-sort algorithm that only made 2 passes. A pass is a
> disk scan, operations are then performed (hopefully in memory) on what
> you read from the disk. So there's no theoretical log N lower-bound

on
> the number of disk passes.
>
> Not that I have anything else useful to add to this discussion, just a
> tidbit I remembered from my CS classes back in college :)
>
> -- Mark
>
> On Fri, 2005-09-23 at 13:17 -0400, Tom Lane wrote:
> passes.
> possible.
passes[color=darkred
]
>
> ---------------------------(end of

broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql
.org so that

your
> message can get through to the mailing list cleanly


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql
.org so that your
message can get through to the mailing list cleanly

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com