Home > Archive > PostgreSQL Discussion > April 2005 > populating a table via the COPY command using C code.









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author populating a table via the COPY command using C code.
Mak, Jason

2005-04-27, 1:23 pm

> hi,
>
> I'm writing an application in C that basically converts binary data into something meaningful. My first attempt was to parse the binary and insert directly to the database in one step. But this proved to be very slow. So I decided to go with a two st

ep process. The first step is to parse the data and create a flat file with tab delimited fields. The second step is to load this data using the COPY command. I don't quite understand how this is done within C. Can someone provide me with some example
s. I've already done some searches on the internet. the examples that I found don't match with I'm trying to do. Please help!

>
> thanks,
> jason.
>


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere
" to majordomo@postgresql
.org)

Michael Fuhr

2005-04-27, 1:23 pm

On Wed, Apr 27, 2005 at 01:12:42PM -0400, Mak, Jason wrote:
>
> The second step is to load this data using the COPY command.
> I don't quite understand how this is done within C.


Are you writing a client application that uses libpq? If so, have
you seen "Functions Associated with the COPY Command" in the libpq
chapter of the documentation?

http://www.postgresql.org/docs/8.0/...libpq-copy.html

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Michael Fuhr

2005-04-27, 8:24 pm

[Please copy the mailing list on replies so others can contribute
to and learn from the discussion.]

On Wed, Apr 27, 2005 at 02:34:26PM -0400, Mak, Jason wrote:
>
> Yes, my application is a client application that uses libpq api, ie.
> PQexec, etc... I have looked at the "Functions Associated with the COPY
> Command". But I still don't understand. what I really need is an
> example of how those api's(PQputCopyData)
are used, other than the
> "simple" example that's provided.


What example are you looking at and what don't you understand about it?

> This "dataload" should be relatively simple. I already have a flat
> file created. I should be able to use some api and say here is the
> pointer to my db connection and here is a pointer to the flat file.
> now do your thing. Perhaps you can explain this to me.


libpq provides the primitives that you could use to implement such
an API: it would be a trivial matter to write a function that opens
the indicated file, reads its contents, and sends them to the
database. As the documentation indicates, you'd use PQexec() or
its ilk to send a COPY FROM STDIN command (see the COPY documentation
for the exact syntax), then PQputCopyData() or PQputline() to send
the data (probably in a loop), then PQputCopyEnd() or PQendcopy()
to indicate that you're finished. Add the necessary file I/O
statements and there's your function.

Do you have a reason for using an intermediate file? Instead of
writing data to the file and then reading it back, you could use
PQputCopyData() or PQputline() to send the data directly to the
database.

Another possibility: if the file resides somewhere the backend can
read, and if you can connect to the database as a superuser, then
you could use COPY tablename FROM 'filename'.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Mak, Jason

2005-04-27, 8:24 pm

> What example are you looking at and what don't you understand about it?

Some of the examples that I looked over are either from the internet or from the Postgres Manual. The API I'm refering to is PQputCopyData. However, with the explanation given below. I'm starting to understand.

> libpq provides the primitives that you could use to implement such
> an API: it would be a trivial matter to write a function that opens
> the indicated file, reads its contents, and sends them to the
> database. As the documentation indicates, you'd use PQexec() or
> its ilk to send a COPY FROM STDIN command (see the COPY documentation
> for the exact syntax), then PQputCopyData() or PQputline() to send
> the data (probably in a loop), then PQputCopyEnd() or PQendcopy()
> to indicate that you're finished. Add the necessary file I/O
> statements and there's your function.


so basically in C, I would open some file i/o using fopen and in a loop. Do something like a read line into the buffer with some byte count and send that to the database using the PQputCopyData. Is this correct??

> Do you have a reason for using an intermediate file? Instead of
> writing data to the file and then reading it back, you could use
> PQputCopyData() or PQputline() to send the data directly to the
> database.


For the project I'm working on. We basically setup a postgres data warehouse. We have a large set of binary data that needs to be parsed and translated into something meaningful. We intend to load this processed data into 3 tables using the quickest me
ans possible. I've already tried parsing and doing inserts. but this proved to be very slow. So I figured a 2 step automated process. The first step would be to parse the data and create 3 separate files. then load each file into the warehouse. Neve
r considered using PQputCopyData in realtime. Not sure how this would work given 3 different tables that hold differnet data or how fast it's going to be. but I have tried the last approach. It works fairly well. The only problem is the lack of insigh
t into where it is during the load processing.

What's your thoughts?? which approach would be the fastest?
1) 2 step process.
2) realtime PQputCopyData - not sure how this would work with 3 different tables.
3) COPY tablename FROM 'filename'

thanks,
jason.



-----Original Message-----
From: Michael Fuhr [mailto:mike@fuhr.org]
Sent: Wednesday, April 27, 2005 3:46 PM
To: Mak, Jason
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] populating a table via the COPY command using C
code.


[Please copy the mailing list on replies so others can contribute
to and learn from the discussion.]

On Wed, Apr 27, 2005 at 02:34:26PM -0400, Mak, Jason wrote:
>
> Yes, my application is a client application that uses libpq api, ie.
> PQexec, etc... I have looked at the "Functions Associated with the COPY
> Command". But I still don't understand. what I really need is an
> example of how those api's(PQputCopyData)
are used, other than the
> "simple" example that's provided.


What example are you looking at and what don't you understand about it?

> This "dataload" should be relatively simple. I already have a flat
> file created. I should be able to use some api and say here is the
> pointer to my db connection and here is a pointer to the flat file.
> now do your thing. Perhaps you can explain this to me.


libpq provides the primitives that you could use to implement such
an API: it would be a trivial matter to write a function that opens
the indicated file, reads its contents, and sends them to the
database. As the documentation indicates, you'd use PQexec() or
its ilk to send a COPY FROM STDIN command (see the COPY documentation
for the exact syntax), then PQputCopyData() or PQputline() to send
the data (probably in a loop), then PQputCopyEnd() or PQendcopy()
to indicate that you're finished. Add the necessary file I/O
statements and there's your function.

Do you have a reason for using an intermediate file? Instead of
writing data to the file and then reading it back, you could use
PQputCopyData() or PQputline() to send the data directly to the
database.

Another possibility: if the file resides somewhere the backend can
read, and if you can connect to the database as a superuser, then
you could use COPY tablename FROM 'filename'.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Sean Davis

2005-04-27, 8:24 pm


On Apr 27, 2005, at 4:48 PM, Mak, Jason wrote:

> What's your thoughts?? which approach would be the fastest?
> 1) 2 step process.
> 2) realtime PQputCopyData - not sure how this would work with 3
> different tables.
> 3) COPY tablename FROM 'filename'
>
> thanks,
> jason.


COPY tablename FROM 'filename'

is VERY fast. Generally, I think people generally load the data into
postgres using COPY (perhaps into a "loader" table that isn't in the
same format that the final tables will be in) and then do data
manipulation and cleaning within the database using database tools.
This paradigm may or may not work for you, but it seems to be pretty
general.

Sean


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com