|
Home > Archive > PostgreSQL Performance > September 2005 > Big question on insert performance/using COPY FROM
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Big question on insert performance/using COPY FROM
|
|
| Morgan Kita 2005-08-31, 8:24 pm |
| Hi,
I am currently trying to speed up the insertion of bulk loads to my database. I have fiddled with all of the parameters that I have seen suggested(aka checkpoint_segments,
checkpoint_timeout, maintinence_work_mem
, and shared buffers) with no success. I ev
en turned off fysnc with no effect so I am pretty sure the biggest problem is that the DB is CPU limited at the moment because of the rather weak machine that postmaster is running on(Athlon 2400+ xp with 512 RAM), but that will change in the future so I
am trying to get performance increases that don't involve changing the machine at the moment.
I am currently inserting into the database through lipqxx's C++ interface. I am using prepared statements that perform regular inserts. I would like to use COPY FROM since I have read so much about its increased performance with respect to INSERT, but I a
m not sure how to use it in my case. So let me give you an idea on how the tables are laid out.
The real DB has more tables, but lets say for the sake of argument I have 3 tables; TB1, TB2, TB3. Lets say that TB1 has a primary key PK1 and a unique identifier column(type text) UK1 that has an index on it. TB2 then has a PK2, a UK2(type text) of its o
wn with an index, and a foreign key FK2 that points to TB1's PK1. TB3 has a PK3 and a FK3 that points to FK2.
TB1 TB2 TB3
-------------- ------------------------------- ----------------------
PK1, UK1 PK2, UK2, FK2(PK1) PK3, FK3(PK2)
Now in lipqxx I am parsing an input list of objects that are then written to these tables. Each object may produce one row in TB1, one row in TB2, and one row in TB3. The UK1 and UK2 indentifiers are used to prevent duplicate entries for TB1 and TB2 respe
ctively. I know COPY FROM obeys these unique checks; however, my problem is the FKs. So lets say I try to insert a row into TB1. If it is unique on UK1 then it inserts a new row with some new primary key int4 identifier and if it is a duplicate then no in
sert is done but the already existing row's primary key identifier is returned. This identifier(duplicate
or not) is used when populating TB2's row as the FK2 identifier. The row that is to be inserted into TB2 needs the primary key indentifier from the r
esult of the attempted insert into TB1. Similarily the insert into TB3 needs the result of the pk indentifier of the attempted insert into TB2. Once that is done then I move on to parsing the next object for insertion into the 3 tables.
So lets say I want to insert a list of objects using COPY FROM... whats the way to do it? How can I at the very least get a list of the primary keys of TB1(newly inserted rows or from already existings row) returned from the COPY FROM insert into TB1 so I
can use them for the COPY FROM insert into TB2 and so on? Is there a better way to do this?
P.S. I am going to setup autovacuum for these bulk loads. My question though is why for bulkloads is VACUUM useful? I understand that it frees up dead rows as a result of UPDATE and such, but where are the dead rows created from plain INSERTS?
Thanks,
Morgan
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
| |
| Richard Huxton 2005-09-01, 7:25 am |
| Morgan Kita wrote:
> Hi,
>
> I am currently trying to speed up the insertion of bulk loads to my
> database. I have fiddled with all of the parameters that I have seen
> suggested(aka checkpoint_segments,
checkpoint_timeout,
> maintinence_work_mem
, and shared buffers) with no success. I even
> turned off fysnc with no effect so I am pretty sure the biggest
> problem is that the DB is CPU limited at the moment because of the
> rather weak machine that postmaster is running on(Athlon 2400+ xp
> with 512 RAM)
Don't be pretty sure, be abolutely sure. What do your various
system-load figures show? Windows has a system performance monitoring
tool that can show CPU/Memory/Disk IO, and *nix tools have vmstat or iostat.
--
Richard Huxton
Archonet Ltd
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
|
|
|
|
|