|
Home > Archive > MS SQL Server DTS > January 2006 > Optimize transformation
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Optimize transformation
|
|
|
| I have the below table that I need to load about 1 billion records
into. I was wondering if anyone could offer some tips on how to
optimize the process.
I load data_table via a staging table that could have 1000 records -
50 million records. I use a left outer join to ensure that no
duplicates are inserted.
I have read that using one copy column transformation for all columns
can increase performance, and I do have "Use Fast Load" selected.
Also, both tables are on the same sever.
Is there a way to speed up this load? Would it be a bad idea to use
the transformation to load a csv text file and then use the bulk insert
command to load the data?
CREATE TABLE [dbo].[data_table] (
[project] [int] NULL ,
[responseid] [int] NULL ,
[attrib1] [varchar] (55) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
[data] [varchar] (255) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
[data_type] [varchar] (155) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL
) ON [PRIMARY]
CREATE CLUSTERED INDEX [icx__cluster] ON
[dbo].[data_table]([responseid], [project]) WITH FILLFACTOR = 70,
PAD_INDEX ON [PRIMARY]
CREATE INDEX & #91;ix__data_table__
attrib1] ON
[dbo].[data_table]([attrib1]) WITH FILLFACTOR = 70, PAD_INDEX ON
[INDEXES]
CREATE INDEX & #91;ix__data_table__
data] ON [dbo].[data_table]([data])
WITH FILLFACTOR = 70, PAD_INDEX ON [INDEXES]
Other Notes:
project: value is too large for a smallint, but there are only about
1200 project values
responseid: unique to project. The largest project has 3.5 million
responseid values
attrib: ~100,000 unique values
data: ~ 500,000 unique values
| |
| Allan Mitchell 2006-01-26, 4:58 pm |
| Hello Dave,
A number of things for you to try
1. get rid of indexes
These will hurt you during loading
2. Recovery model set to SIMPLE
3. How about not using DTS. What about pure TSQL?
4. Make sure the log has lots of room to expand likewise with the data files.
5. Make sure you have no triggers on the destination tables.
Have a look in BOL for any other requirements for FAST LOAD
Allan
> I have the below table that I need to load about 1 billion records
> into. I was wondering if anyone could offer some tips on how to
> optimize the process.
>
> I load data_table via a staging table that could have 1000 records -
> 50 million records. I use a left outer join to ensure that no
> duplicates are inserted.
>
> I have read that using one copy column transformation for all columns
> can increase performance, and I do have "Use Fast Load" selected.
> Also, both tables are on the same sever.
>
> Is there a way to speed up this load? Would it be a bad idea to use
> the transformation to load a csv text file and then use the bulk
> insert command to load the data?
>
> CREATE TABLE [dbo].[data_table] (
> [project] [int] NULL ,
> [responseid] [int] NULL ,
> [attrib1] [varchar] (55) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
> [data] [varchar] (255) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
> [data_type] [varchar] (155) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL
> ) ON [PRIMARY]
> CREATE CLUSTERED INDEX [icx__cluster] ON
> [dbo].[data_table]([responseid], [project]) WITH FILLFACTOR = 70,
> PAD_INDEX ON [PRIMARY]
> CREATE INDEX & #91;ix__data_table__
attrib1] ON
> [dbo].[data_table]([attrib1]) WITH FILLFACTOR = 70, PAD_INDEX ON
> [INDEXES]
> CREATE INDEX & #91;ix__data_table__
data] ON [dbo].[data_table]([data])
> WITH FILLFACTOR = 70, PAD_INDEX ON [INDEXES]
> Other Notes:
> project: value is too large for a smallint, but there are only about
> 1200 project values
> responseid: unique to project. The largest project has 3.5 million
> responseid values
> attrib: ~100,000 unique values
> data: ~ 500,000 unique values
| |
| Allan Mitchell 2006-01-26, 4:58 pm |
| Hello Allan,
And I forgot to mention that if you do use DTS and the Transform Data Task
then use 1 Transform that has all your source columns and destination columns
within. Do not use 1 per pair. Also use a COPY COLUMN transform
Allan
[color=darkred]
> Hello Dave,
>
> A number of things for you to try
>
> 1. get rid of indexes
>
> These will hurt you during loading
>
> 2. Recovery model set to SIMPLE
>
> 3. How about not using DTS. What about pure TSQL?
>
> 4. Make sure the log has lots of room to expand likewise with the
> data files.
>
> 5. Make sure you have no triggers on the destination tables.
>
> Have a look in BOL for any other requirements for FAST LOAD
>
> Allan
>
| |
|
| Thanks Allan for the tips.
I am using one copy column transformation (COM object) for each column.
I will be sure and change that.
I can't drop indexes because they are required for the LEFT OUTER JOIN
that calculates the set that will be inserted.
Recovery is already in SIMPLE; however I don't think recovery mode will
make any difference because the operation must be written to log before
it is written to data files.
Very good tip on using large enough log and data files! That should
minimize the overhead of auto-growing the files.
T-SQL: I have considered this. Does anyone have any insight on the
difference between DTS and plain T-SQL? I understand that DTS uses a
COM object for its transformation. For some reason I have always
thought that DTS is optimized for data transfers/loads. Is this not
the case? Will T-SQL be faster? If so, will committing the records in
small batches (50-100K) using DTS overcompensate for the hit that DTS
takes?
Thanks for the advice!
|
|
|
|
|