Home > Archive > PostgreSQL Discussion > September 2005 > Re: How many insert + update should one transaction









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: How many insert + update should one transaction
Scott Marlowe

2005-09-23, 11:24 am

On Fri, 2005-09-23 at 05:51, Yonatan Ben-Nes wrote:
> Hi all,
>
> Every few days I need to DELETE all of the content of few tables and
> INSERT new data in them.
> The amount of new data is about 5 million rows and each row get about 3
> queries (INSERT + UPDATE).
> Now because I need the old data to be displayed till all of the new data
> will be available I'm doing all of the process of deleting the old
> content and inserting the new one in one transaction.
> Should I divide the insertion so ill insert the new data into a
> temporary table and the transaction should be commited every
> 100,1000,10000 whatever queries? or maybe it doesnt matter to the server
> whats the size of the transaction and its ok to handle such a process in
> one transaction?


The only possible issue would be one of capacity, and possibly having a
lot of dead tuples laying about.

If you have 5 million rows, and you update every one, then you now have
5 million live and 5 million dead tuples in your database. A Vacuum
full will take quite a while.

If you're fsm is set large enough, then as long as you vacuum (regular,
non full vacuum) between these transactions, then the 5 million dead
tuples should get reused. however, the performance of your database
will for selects and such will be like it was a 10 million row database.

Given that you NEED to have all 10 million tuples in the database at the
same time, the use of a temp / holding table would allow you to truncate
the main table, move everything into the main table, and then drop /
truncate the temp / holding table.

If you truncate the main table, then initiate another transaction to
move the data into it, it shouldn't be so bloated, but the down side is
you'll have a period of time when it appears empty to users.

So, the real question is whether or not you can afford to have an empty
table at some point in the process.

If you can't, then either method (running the whole transaction against
the one table or using the temp / holding table) are equivalent. If you
can, there should be a noticeable gain from the method of truncating the
main table outside the update transaction.

If you need that table to always have the old or new tuples (i.e. never
be empty) and you can afford the very lengthy vacuum full on the 5
million dead rows, then that method will give you the best select
performance the rest of the day.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Jim C. Nasby

2005-09-24, 8:23 pm

Just remember the first rule of performance tuning: don't.

Unless you *know* having the dead rows will be an issue, you will almost
certainly be best off going the simple, straightforward route.

On Fri, Sep 23, 2005 at 10:49:00AM -0500, Scott Marlowe wrote:
> On Fri, 2005-09-23 at 05:51, Yonatan Ben-Nes wrote:
>
> The only possible issue would be one of capacity, and possibly having a
> lot of dead tuples laying about.
>
> If you have 5 million rows, and you update every one, then you now have
> 5 million live and 5 million dead tuples in your database. A Vacuum
> full will take quite a while.
>
> If you're fsm is set large enough, then as long as you vacuum (regular,
> non full vacuum) between these transactions, then the 5 million dead
> tuples should get reused. however, the performance of your database
> will for selects and such will be like it was a 10 million row database.
>
> Given that you NEED to have all 10 million tuples in the database at the
> same time, the use of a temp / holding table would allow you to truncate
> the main table, move everything into the main table, and then drop /
> truncate the temp / holding table.
>
> If you truncate the main table, then initiate another transaction to
> move the data into it, it shouldn't be so bloated, but the down side is
> you'll have a period of time when it appears empty to users.
>
> So, the real question is whether or not you can afford to have an empty
> table at some point in the process.
>
> If you can't, then either method (running the whole transaction against
> the one table or using the temp / holding table) are equivalent. If you
> can, there should be a noticeable gain from the method of truncating the
> main table outside the update transaction.
>
> If you need that table to always have the old or new tuples (i.e. never
> be empty) and you can afford the very lengthy vacuum full on the 5
> million dead rows, then that method will give you the best select
> performance the rest of the day.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql
.org so that your
message can get through to the mailing list cleanly

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com