Home > Archive > MS SQL Server > October 2006 > Select Into V/s bcp and bulk insert









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Select Into V/s bcp and bulk insert
Rishi

2006-10-24, 6:33 pm

Which is a better option to transfer data between two sql servers:

1. Over a linked server:
SELECT * INTO <table name> FROM <server.database.dbo.table>
with (nolock)

2. bcp "SELECT * FROM <database.dbo.table>" queryout "temp.txt" -N -k
-E -U <username> -P <password> -S <server>
BULK INSERT <database.dbo.table> FROM temp.txt WITH
(DATAFILETYPE='widen
ative',KEEPNULLS,KEE
PIDENTITY)

John Bell

2006-10-24, 6:33 pm

Hi

This will depend on several things such as how well the two servers are
connected, how much disruption it may cause, what volume of data is being
transferred...

In general you say that SELECT... INTO... or INSERT... SELECT options (if
the table exists you can use SELECT.. INTO...) would be more suitable for
smaller volumes on close servers (or same server), where as BCP/BULK INSERT
for larger volumns, distant servers (or lower bandwidth). The BCP may give
you more control (in terms of it does not have to be done in one operation
and you don't necessarily have to start from scratch if it fails to load)

There are other options such as DTS or the Import/Export Wizard.

By the way you don't need QUERYOUT if selecting everything from a table or
view, or you could create a view for less complex queries to simplfy the BCP.

HTH

John

"Rishi" wrote:

> Which is a better option to transfer data between two sql servers:
>
> 1. Over a linked server:
> SELECT * INTO <table name> FROM <server.database.dbo.table>
> with (nolock)
>
> 2. bcp "SELECT * FROM <database.dbo.table>" queryout "temp.txt" -N -k
> -E -U <username> -P <password> -S <server>
> BULK INSERT <database.dbo.table> FROM temp.txt WITH
> (DATAFILETYPE='widen
ative',KEEPNULLS,KEE
PIDENTITY)
>
>

Rishi

2006-10-24, 6:33 pm

Thanks for your inputs John.

To clarify more on the case:
- The servers are distant,
- On a busy day the bandwidth can be less,
- Volume varies a lot, from a few thousands to 10 millions, from
different servers.
- do not have control over the source servers, so dont know how
creating views would work out.
- will need to write queries in most of the scenarios, to pick data
between date ranges, joins etc.

Now questions on bcp-
To avoid starting from scratch incase of failure is doing it in batch
the answer? if so how can i trace back to which row in the source did
the failure happen?

how can I avoid the data file size to grow beyond a point (max free
space availble on disk) and may be do the transfer in batches of a
fixed filed size. This without suffereing the speed and performance.

if the file size is less than the amount of data being copied over bcp
tries to fit as much as it can and that results in unexpected end of
file error. Where as in select into there is no such problem, it takes
care of every thing.

Is there a big performance / speed difference between the two?

Thanks for your help.

Rishi.

John Bell wrote:[color=darkred
]
> Hi
>
> This will depend on several things such as how well the two servers are
> connected, how much disruption it may cause, what volume of data is being
> transferred...
>
> In general you say that SELECT... INTO... or INSERT... SELECT options (if
> the table exists you can use SELECT.. INTO...) would be more suitable for
> smaller volumes on close servers (or same server), where as BCP/BULK INSERT
> for larger volumns, distant servers (or lower bandwidth). The BCP may give
> you more control (in terms of it does not have to be done in one operation
> and you don't necessarily have to start from scratch if it fails to load)
>
> There are other options such as DTS or the Import/Export Wizard.
>
> By the way you don't need QUERYOUT if selecting everything from a table or
> view, or you could create a view for less complex queries to simplfy the BCP.
>
> HTH
>
> John
>
> "Rishi" wrote:
>

John Bell

2006-10-24, 6:33 pm

Hi Rishi

If you are using more complex queries then you will probably need to use
queryout in the same way as the must simpler example you posted. It is
soundling like you may want to look at using replication.

If you are worried about partial files then you could look at using some
form of manifest and checksumming each file. This would allow you to know
what files to expect and if you have the complete file. It is getting into
more complex 2 way communications though!

The easiest way to limit the size of file is to extract the data in more
frequently, they don't need to be transferred at the same time, you would
need to maintain what information has been extracted if you did that (rather
than saying get me all the data for a given day).

John

"Rishi" wrote:

> Thanks for your inputs John.
>
> To clarify more on the case:
> - The servers are distant,
> - On a busy day the bandwidth can be less,
> - Volume varies a lot, from a few thousands to 10 millions, from
> different servers.
> - do not have control over the source servers, so dont know how
> creating views would work out.
> - will need to write queries in most of the scenarios, to pick data
> between date ranges, joins etc.
>
> Now questions on bcp-
> To avoid starting from scratch incase of failure is doing it in batch
> the answer? if so how can i trace back to which row in the source did
> the failure happen?
>
> how can I avoid the data file size to grow beyond a point (max free
> space availble on disk) and may be do the transfer in batches of a
> fixed filed size. This without suffereing the speed and performance.
>
> if the file size is less than the amount of data being copied over bcp
> tries to fit as much as it can and that results in unexpected end of
> file error. Where as in select into there is no such problem, it takes
> care of every thing.
>
> Is there a big performance / speed difference between the two?
>
> Thanks for your help.
>
> Rishi.
>
> John Bell wrote:
>
>

Rishi

2006-10-24, 6:33 pm

This sure helps.
One more thing the bcp im running is running over the linked server,
becuase i believe it requires an entry of the remote server in
sysservers. although it doesnot rely on the linkedserver's proxy login
credential to execute, as im supplyin the login and passwd and not
using the trusted connection option. is this ok ? or is there a better
way of doing this.

One last concluding question:
So keeping everything in mind still bcp with bulk insert is the
best(fastest) way to do this and is better than 'select into' anyday.
right?

Thanks,
Rishi


John Bell wrote:[color=darkred
]
> Hi Rishi
>
> If you are using more complex queries then you will probably need to use
> queryout in the same way as the must simpler example you posted. It is
> soundling like you may want to look at using replication.
>
> If you are worried about partial files then you could look at using some
> form of manifest and checksumming each file. This would allow you to know
> what files to expect and if you have the complete file. It is getting into
> more complex 2 way communications though!
>
> The easiest way to limit the size of file is to extract the data in more
> frequently, they don't need to be transferred at the same time, you would
> need to maintain what information has been extracted if you did that (rather
> than saying get me all the data for a given day).
>
> John
>
> "Rishi" wrote:
>

John Bell

2006-10-24, 6:33 pm

Hi

Use the -S argument to specify the server, you must already have a login and
access to the remote table, there is also a -U and -P argument for the login
id and password although if you are making this visable (for instance in a
batch file) there is a security risk.

As far as speed goes you would have to try both methods over a period with
typical volumes, INSERT..SELECT or SELECT.. INTO may well be faster for
smaller sizes, but speed is not the only factor you should be taking into
account (as described in my first response).

John

"Rishi" wrote:

> This sure helps.
> One more thing the bcp im running is running over the linked server,
> becuase i believe it requires an entry of the remote server in
> sysservers. although it doesnot rely on the linkedserver's proxy login
> credential to execute, as im supplyin the login and passwd and not
> using the trusted connection option. is this ok ? or is there a better
> way of doing this.
>
> One last concluding question:
> So keeping everything in mind still bcp with bulk insert is the
> best(fastest) way to do this and is better than 'select into' anyday.
> right?
>
> Thanks,
> Rishi
>
>
> John Bell wrote:
>
>

Andrew Pike

2006-10-24, 6:33 pm

BULK INSERT would be the better option if you are in a position to take
advantage of 'minimally' logged operations. This drastically reduces the
transaction logging overhead of a bulk data load by only logging extent
allocations/deallocations, as opposed to logging row based operations. You
do have to meet certain criteria though:

1) The target database must be set to either SIMPLE or BULK LOGGED recovery
model.
2) The target table must have either zero rows or no indexes.
3) The target table must not have any triggers.
4) The target table must not be subject to replication.
5) The TABLOCK hint must be in effect.

Regards

Andrew Pike
--
SQL Server DBA
UBS IB



"Rishi" wrote:

> This sure helps.
> One more thing the bcp im running is running over the linked server,
> becuase i believe it requires an entry of the remote server in
> sysservers. although it doesnot rely on the linkedserver's proxy login
> credential to execute, as im supplyin the login and passwd and not
> using the trusted connection option. is this ok ? or is there a better
> way of doing this.
>
> One last concluding question:
> So keeping everything in mind still bcp with bulk insert is the
> best(fastest) way to do this and is better than 'select into' anyday.
> right?
>
> Thanks,
> Rishi
>
>
> John Bell wrote:
>
>

Rishi

2006-10-24, 6:33 pm

Ofcourse Andrew, bulk insert with bcp is wht i intend to use, the
question was which one to choose between "Select ...INTO ..." on a
linked server and "bcp with bulk insert".
any thoughts?

thanks,Rishi.
Andrew Pike wrote:[color=darkred
]
> BULK INSERT would be the better option if you are in a position to take
> advantage of 'minimally' logged operations. This drastically reduces the
> transaction logging overhead of a bulk data load by only logging extent
> allocations/deallocations, as opposed to logging row based operations. You
> do have to meet certain criteria though:
>
> 1) The target database must be set to either SIMPLE or BULK LOGGED recovery
> model.
> 2) The target table must have either zero rows or no indexes.
> 3) The target table must not have any triggers.
> 4) The target table must not be subject to replication.
> 5) The TABLOCK hint must be in effect.
>
> Regards
>
> Andrew Pike
> --
> SQL Server DBA
> UBS IB
>
>
>
> "Rishi" wrote:
>

John Bell

2006-10-24, 6:33 pm

Hi

Is in the previous post, unless you query is joining across databases there
should be no need to use a linked server.

John

"Rishi" wrote:

> Ofcourse Andrew, bulk insert with bcp is wht i intend to use, the
> question was which one to choose between "Select ...INTO ..." on a
> linked server and "bcp with bulk insert".
> any thoughts?
>
> thanks,Rishi.
> Andrew Pike wrote:
>
>

Rishi

2006-10-24, 6:33 pm

Yes John, the query is accross two different databases on two
different servers.

John Bell wrote:[color=darkred
]
> Hi
>
> Is in the previous post, unless you query is joining across databases there
> should be no need to use a linked server.
>
> John
>
> "Rishi" wrote:
>

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2009 droptable.com