|
Home > Archive > PostgreSQL Performance > March 2006 > Massive Inserts Strategies
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Massive Inserts Strategies
|
|
| ashah 2006-03-22, 11:32 am |
| I have a database with foreign keys enabled on the schema. I receive different
files, some of them are huge. And I need to load these files in the database
every night. There are several scenerios that I want to design an optimal
solution for -
1. One of the file has around 80K records and I have to delete everything from
the table and load this file. The provider never provides a "delta file" so I
dont have a way to identify which records are already present and which are
new. If I dont delete everything and insert fresh, I have to make around 80K
selects to decide if the records exist or not. Now there are lot of tables
that have foreign keys linked with this table so unless I disable the foreign
keys, I cannot really delete anything from this table. What would be a good
practise here?
2. Another file that I receive has around 150K records that I need to load in
the database. Now one of the fields is logically a "foreign key" to another
table, and it is linked to the parent table via a database generated unique
ID instead of the actual value. But the file comes with the actual value. So
once again, I have to either drop the foreign key, or make 150K selects to
determine the serial ID so that the foreign key is satisfied. What would be a
good strategy in this scenerio ?
Please pardon my inexperience with database !
Thanks,
Amit
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
| |
|
|
For both cases, you could COPY your file into a temporary table and do a
big JOIN with your existing table, one for inserting new rows, and one for
updating existing rows.
Doing a large bulk query is a lot more efficient than doing a lot of
selects. Vacuum afterwards, and you'll be fine.
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql
.org so that your
message can get through to the mailing list cleanly
| |
| Jim C. Nasby 2006-03-22, 11:32 am |
| Load the files into a temp table and go from there...
COPY ... FROM file;
UPDATE existing_table SET ... WHERE ...;
INSERT INTO existing_table SELECT * FROM temp_table WHERE NOT EXISTS(
SELECT * FROM existing_table WHERE ...)
On Wed, Mar 22, 2006 at 10:32:10AM -0500, ashah wrote:
> I have a database with foreign keys enabled on the schema. I receive different
> files, some of them are huge. And I need to load these files in the database
> every night. There are several scenerios that I want to design an optimal
> solution for -
>
> 1. One of the file has around 80K records and I have to delete everything from
> the table and load this file. The provider never provides a "delta file" so I
> dont have a way to identify which records are already present and which are
> new. If I dont delete everything and insert fresh, I have to make around 80K
> selects to decide if the records exist or not. Now there are lot of tables
> that have foreign keys linked with this table so unless I disable the foreign
> keys, I cannot really delete anything from this table. What would be a good
> practise here?
>
> 2. Another file that I receive has around 150K records that I need to load in
> the database. Now one of the fields is logically a "foreign key" to another
> table, and it is linked to the parent table via a database generated unique
> ID instead of the actual value. But the file comes with the actual value. So
> once again, I have to either drop the foreign key, or make 150K selects to
> determine the serial ID so that the foreign key is satisfied. What would be a
> good strategy in this scenerio ?
>
> Please pardon my inexperience with database !
>
> Thanks,
> Amit
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
| |
| ashah 2006-03-28, 11:32 am |
| I tried this solution, but ran into following problem.
The temp_table has columns (col1, col2, col3).
The original_table has columns (col0, col1, col2, col3)
Now the extra col0 on the original_table is the unique generated ID by the
database.
How can I make your suggestions work in that case .. ?
On Wednesday 22 March 2006 11:09 am, Jim C. Nasby wrote:[color=darkred
]
> Load the files into a temp table and go from there...
>
> COPY ... FROM file;
> UPDATE existing_table SET ... WHERE ...;
> INSERT INTO existing_table SELECT * FROM temp_table WHERE NOT EXISTS(
> SELECT * FROM existing_table WHERE ...)
>
> On Wed, Mar 22, 2006 at 10:32:10AM -0500, ashah wrote:
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
| |
| Markus Schaber 2006-03-28, 11:32 am |
| Hi, ashah,
ashah wrote:
> I tried this solution, but ran into following problem.
>
> The temp_table has columns (col1, col2, col3).
>
> The original_table has columns (col0, col1, col2, col3)
> Now the extra col0 on the original_table is the unique generated ID by
> the database.
INSERT INTO original_table (col1, col2, col3) SELECT col1, col2, col3
FROM temp_table WHERE ...
HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS
Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
| |
| Jim C. Nasby 2006-03-28, 1:32 pm |
| Is there some other unique key you can test on?
Take a look at http://lnk.nu/cvs.distributed.net/8qt.sql lines 169-216
for an exammple. In this case we use a different method for assigning
IDs than you probably will, but the idea remains.
On Tue, Mar 28, 2006 at 10:59:49AM -0500, ashah wrote:
> I tried this solution, but ran into following problem.
>
> The temp_table has columns (col1, col2, col3).
>
> The original_table has columns (col0, col1, col2, col3)
>
> Now the extra col0 on the original_table is the unique generated ID by the
> database.
>
> How can I make your suggestions work in that case .. ?
>
> On Wednesday 22 March 2006 11:09 am, Jim C. Nasby wrote:
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
|
|
|
|
|