Home > Archive > MS SQL Data Warehousing > March 2006 > Foreign Keys in Fact Table - performance help









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Foreign Keys in Fact Table - performance help
Ray

2006-03-05, 8:24 pm

We are still fairly new to data-warehousing. We have a fact table that has a
fk to the dimension table... and the lookup translation is too SLOW...

We are currently using t-sql and stored procedures for our ETL work. We open
a cursor to read the source (stage table) and do basic insert operations
into the datamart fact table. That means for each source row, we have to
translate the source ID to a destination ID.

For example:
Fact Product Sales has fk to Dimension Product.
source product Id = 99. datamart destination product Id = 1099.

We have a lookup tables that manages the mapping of source & destination for
our dimension tables. We call a stored procedure to retrieve the destination
key from the lookup table during the Fact Table ETL. The problem, of course,
is that this stored procedure is called once per FK column and per record.

Can someone please recommend options to make this faster? Does SSIS have
in-memory data structures to support this? Or are there other tricks I can
employ such as indexes, turning off FKs, using 'NOLOCK' and 'ROWCOUNT' off,
reducing the number of columns retrieved in the cursor, etc. --- I've done
all of these, but it still takes approximately 2500 rows/minute.

Thank you!

Ray


Jéjé

2006-03-05, 8:24 pm

to "convert" the source ID to a dimension surrogate key whe you load your
fact table, simply use a view.

select Dim1.Key1, Dim2.Key2, ..., Fact.Sum1...
from Fact
inner join DWDatabase.dbo.Dim1 Dim1
on Dim1.ID = Fact.DimID1
....

use the bulk insert task to use the fast load options.
you can load thousands of rows / sec.
in my case I reach 100 000rows/sec on a small server

doing a row by row loading is the slower option.

SSIS can do a lookup in memory, so there is an advantage:
* the staging database and the DW database can be on 2 different servers
(while a view cause some restriction or performance issues)
* you can identify missing codes during the loading process more easely (the
view will load only matching keys in dimensions using an inner join clause)

"Ray" <raymondc@symmetrics.net> wrote in message
news:uavdR9kPGHA.3944@tk2msftngp13.phx.gbl...
> We are still fairly new to data-warehousing. We have a fact table that has
> a fk to the dimension table... and the lookup translation is too SLOW...
>
> We are currently using t-sql and stored procedures for our ETL work. We
> open a cursor to read the source (stage table) and do basic insert
> operations into the datamart fact table. That means for each source row,
> we have to translate the source ID to a destination ID.
>
> For example:
> Fact Product Sales has fk to Dimension Product.
> source product Id = 99. datamart destination product Id = 1099.
>
> We have a lookup tables that manages the mapping of source & destination
> for our dimension tables. We call a stored procedure to retrieve the
> destination key from the lookup table during the Fact Table ETL. The
> problem, of course, is that this stored procedure is called once per FK
> column and per record.
>
> Can someone please recommend options to make this faster? Does SSIS have
> in-memory data structures to support this? Or are there other tricks I can
> employ such as indexes, turning off FKs, using 'NOLOCK' and 'ROWCOUNT'
> off, reducing the number of columns retrieved in the cursor, etc. --- I've
> done all of these, but it still takes approximately 2500 rows/minute.
>
> Thank you!
>
> Ray
>



JT

2006-03-05, 8:24 pm

Using a cursor to perform lookups / inserts one row at a time will be very
slow. Instead use a single insert into query that joins to the related
lookup tables.

insert into myfacttable
select
..
from
sales
join ..
join ..
join ..

Insure that all primary and foreign keys are indexed:
http://www.microsoft.com/technet/pr...s/c0618260.mspx
http://msdn.microsoft.com/library/d...eNetHowTo03.asp

Also, use the Show Execution Plan feature of Query Analyzer to investigate
the retreival method used by SQL Server to execute the query and look for
clues on how to improve it's performance.
http://msdn.microsoft.com/library/d..._tun_1_5pde.asp


"Ray" <raymondc@symmetrics.net> wrote in message
news:uavdR9kPGHA.3944@tk2msftngp13.phx.gbl...
> We are still fairly new to data-warehousing. We have a fact table that has
> a fk to the dimension table... and the lookup translation is too SLOW...
>
> We are currently using t-sql and stored procedures for our ETL work. We
> open a cursor to read the source (stage table) and do basic insert
> operations into the datamart fact table. That means for each source row,
> we have to translate the source ID to a destination ID.
>
> For example:
> Fact Product Sales has fk to Dimension Product.
> source product Id = 99. datamart destination product Id = 1099.
>
> We have a lookup tables that manages the mapping of source & destination
> for our dimension tables. We call a stored procedure to retrieve the
> destination key from the lookup table during the Fact Table ETL. The
> problem, of course, is that this stored procedure is called once per FK
> column and per record.
>
> Can someone please recommend options to make this faster? Does SSIS have
> in-memory data structures to support this? Or are there other tricks I can
> employ such as indexes, turning off FKs, using 'NOLOCK' and 'ROWCOUNT'
> off, reducing the number of columns retrieved in the cursor, etc. --- I've
> done all of these, but it still takes approximately 2500 rows/minute.
>
> Thank you!
>
> Ray
>



bill.robinette@gmail.com

2006-03-08, 8:23 pm


Ray wrote:
> We are still fairly new to data-warehousing. We have a fact table that has a
> fk to the dimension table... and the lookup translation is too SLOW...
>
> We are currently using t-sql and stored procedures for our ETL work. We open
> a cursor to read the source (stage table) and do basic insert operations
> into the datamart fact table. That means for each source row, we have to
> translate the source ID to a destination ID.
>
> For example:
> Fact Product Sales has fk to Dimension Product.
> source product Id = 99. datamart destination product Id = 1099.
>
> We have a lookup tables that manages the mapping of source & destination for
> our dimension tables. We call a stored procedure to retrieve the destination
> key from the lookup table during the Fact Table ETL. The problem, of course,
> is that this stored procedure is called once per FK column and per record.
>
> Can someone please recommend options to make this faster? Does SSIS have
> in-memory data structures to support this? Or are there other tricks I can
> employ such as indexes, turning off FKs, using 'NOLOCK' and 'ROWCOUNT' off,
> reducing the number of columns retrieved in the cursor, etc. --- I've done
> all of these, but it still takes approximately 2500 rows/minute.
>
> Thank you!
>
> Ray


Cursor in DSS environment = VERY bad idea. Use SQL and joins.

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2009 droptable.com