|
Home > Archive > MS SQL Data Warehousing > July 2005 > updating 1.5 millions rows slow...
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
updating 1.5 millions rows slow...
|
|
|
| Hi,
I have a DTS package where I load 1.5 millions of rows in a staging table,
and I perform multiple updates.
1 of these updates update the entire table by doing a simple lookup in a
small table.
this update takes 30 minutes !
my file is on a Raid 0
I have tried to update by batch (set rowcount option), but this reduce the
process time.
during the update the processors are not used, or just a little (25% of 1
proc on 4 available)
the system appear to wait for the disk pageiolatch wait type.
what can I do to improove this?
if there is no solution, I'll load the table a second time in another
temporary table, because a bulk insert takes only 5 minutes! (which is
bettre then 30 minutes)
thanks
jerome.
| |
| Adam Machanic 2005-06-27, 3:23 am |
| Have you tried not updating the entire table -- but only the modified rows?
(Is that possible in your situation?)
--ANSI-standard method:
UPDATE YourTable
SET YourCol =
(
SELECT SomeVal
FROM OtherTable
WHERE YourTable.ID = OtherTable.ID
)
WHERE
YourTable.YourCol <>
(
SELECT SomeVal
FROM OtherTable
WHERE YourTable.ID = OtherTable.ID
)
--T-SQL proprietary method:
UPDATE YourTable
SET YourTable.YourCol = OtherTable.SomeVal
FROM YourTable
JOIN OtherTable ON YourTable.ID = OtherTable.ID
WHERE
YourTable.YourCol <> OtherTable.SomeVal
--
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--
"Jéjé" < willgart@BBBhotmailA
AA.com> wrote in message
news:em3$qTseFHA.3028@TK2MSFTNGP09.phx.gbl...
> Hi,
>
> I have a DTS package where I load 1.5 millions of rows in a staging table,
> and I perform multiple updates.
> 1 of these updates update the entire table by doing a simple lookup in a
> small table.
> this update takes 30 minutes !
>
> my file is on a Raid 0
>
> I have tried to update by batch (set rowcount option), but this reduce the
> process time.
>
> during the update the processors are not used, or just a little (25% of 1
> proc on 4 available)
> the system appear to wait for the disk pageiolatch wait type.
>
> what can I do to improove this?
>
> if there is no solution, I'll load the table a second time in another
> temporary table, because a bulk insert takes only 5 minutes! (which is
> bettre then 30 minutes)
>
> thanks
>
> jerome.
>
>
| |
|
| yes, but I have to update all rows,
I must update a "value" column (which is null by default) by doing a lookup
in another table.
I can't join the lookup table during the first bulk insert because the big
table come from 1 system and the lookup table from another.
But finally, the update appear to be too slow.
"Adam Machanic" <amachanic@hotmail._removetoemail_.com> wrote in message
news:%23c74YYseFHA.4040@TK2MSFTNGP14.phx.gbl...
> Have you tried not updating the entire table -- but only the modified
> rows?
> (Is that possible in your situation?)
>
>
> --ANSI-standard method:
>
> UPDATE YourTable
> SET YourCol =
> (
> SELECT SomeVal
> FROM OtherTable
> WHERE YourTable.ID = OtherTable.ID
> )
> WHERE
> YourTable.YourCol <>
> (
> SELECT SomeVal
> FROM OtherTable
> WHERE YourTable.ID = OtherTable.ID
> )
>
>
>
> --T-SQL proprietary method:
>
> UPDATE YourTable
> SET YourTable.YourCol = OtherTable.SomeVal
> FROM YourTable
> JOIN OtherTable ON YourTable.ID = OtherTable.ID
> WHERE
> YourTable.YourCol <> OtherTable.SomeVal
>
>
> --
> Adam Machanic
> SQL Server MVP
> http://www.datamanipulation.net
> --
>
>
> "Jéjé" < willgart@BBBhotmailA
AA.com> wrote in message
> news:em3$qTseFHA.3028@TK2MSFTNGP09.phx.gbl...
>
>
| |
| Rick Gittins 2005-06-27, 11:23 am |
| Have you tried to create indexes in your staging table on the field you are
using to link to your reference table after the data is imported, but before
you start the updates?
Rick
"Jéjé" < willgart@BBBhotmailA
AA.com> wrote in message
news:em3$qTseFHA.3028@TK2MSFTNGP09.phx.gbl...
> Hi,
>
> I have a DTS package where I load 1.5 millions of rows in a staging table,
> and I perform multiple updates.
> 1 of these updates update the entire table by doing a simple lookup in a
> small table.
> this update takes 30 minutes !
>
> my file is on a Raid 0
>
> I have tried to update by batch (set rowcount option), but this reduce the
> process time.
>
> during the update the processors are not used, or just a little (25% of 1
> proc on 4 available)
> the system appear to wait for the disk pageiolatch wait type.
>
> what can I do to improove this?
>
> if there is no solution, I'll load the table a second time in another
> temporary table, because a bulk insert takes only 5 minutes! (which is
> bettre then 30 minutes)
>
> thanks
>
> jerome.
>
>
| |
| Adam Machanic 2005-06-27, 11:23 am |
| Is the column you're updating clustered and/or heavily indexed? Can you
drop some of the indexes before the update, if that's the case? Or if it's
clustered, can you cluster on something else?
--
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--
"Jéjé" < willgart_A_@hotmail_
A_.com> wrote in message
news:uho%23lGxeFHA.2984@TK2MSFTNGP15.phx.gbl...
> yes, but I have to update all rows,
> I must update a "value" column (which is null by default) by doing a
lookup
> in another table.
> I can't join the lookup table during the first bulk insert because the big
> table come from 1 system and the lookup table from another.
> But finally, the update appear to be too slow.
>
> "Adam Machanic" <amachanic@hotmail._removetoemail_.com> wrote in message
> news:%23c74YYseFHA.4040@TK2MSFTNGP14.phx.gbl...
a[color=darkred]
1[color=darkred]
>
>
| |
|
| I believe that transaction logging what you are fighting against. Here is my
reply to similar post in *sqlserver.programmer last week.
Performance Issues for Huge Data import/insert
http://groups-beta.google.com/group...143a0e83b096523
"Jéjé" < willgart@BBBhotmailA
AA.com> wrote in message
news:em3$qTseFHA.3028@TK2MSFTNGP09.phx.gbl...
> Hi,
>
> I have a DTS package where I load 1.5 millions of rows in a staging table,
> and I perform multiple updates.
> 1 of these updates update the entire table by doing a simple lookup in a
> small table.
> this update takes 30 minutes !
>
> my file is on a Raid 0
>
> I have tried to update by batch (set rowcount option), but this reduce the
> process time.
>
> during the update the processors are not used, or just a little (25% of 1
> proc on 4 available)
> the system appear to wait for the disk pageiolatch wait type.
>
> what can I do to improove this?
>
> if there is no solution, I'll load the table a second time in another
> temporary table, because a bulk insert takes only 5 minutes! (which is
> bettre then 30 minutes)
>
> thanks
>
> jerome.
>
>
| |
| Adam Machanic 2005-06-27, 8:23 pm |
| JT,
I agree with your post, except the part about setting autogrow to 10%! That
can be very problematic. A better option is to manually grow the database
large enough to begin with, and not worry about shrinking it back down
(which can cause fragmentation) -- disk space is cheap. The problem with
percentage autogrow is seen in larger databases. What happens if your
database is 100 GB? How long will a 10 GB autogrow take? How about if
you're working on a 1 TB+ database? At least on my somewhat underpowered
disk systems, I don't want to wait on a 10 GB grow in the middle of a
transaction, let alone 100 GB! I have mine set to grow 10 MB -- and I have
a monitor set up to alert me if the database ever gets within 20% full.
I'll go in and manually grow it at that point... I don't ever want to see
auto-growth during a production transaction. The 10 MB is a failsafe --
should I hit the wall (or miss the alert), the database won't crash, and
10MB is small enough that it won't cause any timeouts or annoy users.
--
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--
"JT" <someone@microsoft.com> wrote in message
news:eFFGAo0eFHA.3880@tk2msftngp13.phx.gbl...
> I believe that transaction logging what you are fighting against. Here is
my
> reply to similar post in *sqlserver.programmer last week.
>
> Performance Issues for Huge Data import/insert
>
http://groups-beta.google.com/group...143a0e83b096523
>
> "Jéjé" < willgart@BBBhotmailA
AA.com> wrote in message
> news:em3$qTseFHA.3028@TK2MSFTNGP09.phx.gbl...
table,[color=darkred
]
the[color=darkred]
1[color=darkred]
>
>
| |
|
| My post about autogrowing was in reply to someone who was inserting 6GB of
data. If the data being imported has a lot of varchar columns, then the
density of the populated values can have a significant impact on the total
space consumed (50% - >200% larger), and sometimes it is up to the user
about how densly populated their non-required columns are. I try to set the
filesize to something that can handle the expected record volume, but leave
autogrow just in case to prevent the process aborting due to lack of
available space. One large 1GB growth on a 10GB database can take less time
overall than 100 smaller 10mb increments.
"Adam Machanic" <amachanic@hotmail._removetoemail_.com> wrote in message
news:u01e6q1eFHA.3160@TK2MSFTNGP09.phx.gbl...
> JT,
>
> I agree with your post, except the part about setting autogrow to 10%!
That
> can be very problematic. A better option is to manually grow the database
> large enough to begin with, and not worry about shrinking it back down
> (which can cause fragmentation) -- disk space is cheap. The problem with
> percentage autogrow is seen in larger databases. What happens if your
> database is 100 GB? How long will a 10 GB autogrow take? How about if
> you're working on a 1 TB+ database? At least on my somewhat underpowered
> disk systems, I don't want to wait on a 10 GB grow in the middle of a
> transaction, let alone 100 GB! I have mine set to grow 10 MB -- and I
have
> a monitor set up to alert me if the database ever gets within 20% full.
> I'll go in and manually grow it at that point... I don't ever want to see
> auto-growth during a production transaction. The 10 MB is a failsafe --
> should I hit the wall (or miss the alert), the database won't crash, and
> 10MB is small enough that it won't cause any timeouts or annoy users.
>
>
> --
> Adam Machanic
> SQL Server MVP
> http://www.datamanipulation.net
> --
>
>
> "JT" <someone@microsoft.com> wrote in message
> news:eFFGAo0eFHA.3880@tk2msftngp13.phx.gbl...
is[color=darkred]
> my
>
http://groups-beta.google.com/group...143a0e83b096523
> table,
a[color=darkred]
> the
of[color=darkred]
> 1
>
>
| |
|
| there is no index in this table
adding some don't provide any performance improvment, The system do a full
scan of the table.
I work with the sysadmin, I think there is an hardware issue, I think my
file is not on a dedicated Raid 0 like requested, but on a shared raid 5!!!
"Adam Machanic" <amachanic@hotmail._removetoemail_.com> wrote in message
news:uO%23DfgzeFHA.1448@TK2MSFTNGP14.phx.gbl...
> Is the column you're updating clustered and/or heavily indexed? Can you
> drop some of the indexes before the update, if that's the case? Or if
> it's
> clustered, can you cluster on something else?
>
>
> --
> Adam Machanic
> SQL Server MVP
> http://www.datamanipulation.net
> --
>
>
> "Jéjé" < willgart_A_@hotmail_
A_.com> wrote in message
> news:uho%23lGxeFHA.2984@TK2MSFTNGP15.phx.gbl...
> lookup
> a
> 1
>
>
| |
| David Browne 2005-06-27, 8:23 pm |
|
"Jéjé" < willgart@BBBhotmailA
AA.com> wrote in message
news:em3$qTseFHA.3028@TK2MSFTNGP09.phx.gbl...
> Hi,
>
> I have a DTS package where I load 1.5 millions of rows in a staging table,
> and I perform multiple updates.
> 1 of these updates update the entire table by doing a simple lookup in a
> small table.
> this update takes 30 minutes !
>
> my file is on a Raid 0
>
> I have tried to update by batch (set rowcount option), but this reduce the
> process time.
>
> during the update the processors are not used, or just a little (25% of 1
> proc on 4 available)
> the system appear to wait for the disk pageiolatch wait type.
>
> what can I do to improove this?
A couple of things. First, try to get an in-place update by using a
fixed-width data type and/or preloading dummy values into the staging table.
An update which expands all the rows is much more expensive than one which
does not.
Second, you are on the right track with replacing update with insert. Just
use a insert-select-join to load a second staging table with the updated
values.
David
| |
| Peter Nolan 2005-07-08, 11:23 am |
| Hi Jerome,
it sounds to me that if you are updating all rows in the table with the
lookup that you must be performing 1.5M updates, and these updates must
be logged....correct?
If this is the case you would be well better off to load the two tables
into a staging area, perform joins to produce the image of the row you
want to actually have, unload that view of the data and then load it
into the target table....that way you will not be performing any logged
updates you will only be using the loader.....which is WAY faster than
logged updates....
Best Regards
Peter Nolan
www.peternolan.com
| |
|
| yeah, its an option. (maybe the only good option)
its just because we have some "cascading" updates
rows updated in the first query will not be updated in the second query, but
again updated in the third query
because some updates must be done before others and not always on all
records.
Today updates statement are simple to manage, but slow to process.
"Peter Nolan" <peter@peternolan.com> wrote in message
news:1120836359.993774.262900@g14g2000cwa.googlegroups.com...
> Hi Jerome,
> it sounds to me that if you are updating all rows in the table with the
> lookup that you must be performing 1.5M updates, and these updates must
> be logged....correct?
>
> If this is the case you would be well better off to load the two tables
> into a staging area, perform joins to produce the image of the row you
> want to actually have, unload that view of the data and then load it
> into the target table....that way you will not be performing any logged
> updates you will only be using the loader.....which is WAY faster than
> logged updates....
>
> Best Regards
>
> Peter Nolan
> www.peternolan.com
>
| |
| Peter Nolan 2005-07-10, 7:23 am |
| Hi Jerome,
hhhhmmm....not pretty.....well, good luck with it.......if you can get
away with not updating along the way you will be much better
off.....;-)....it can almost always be coded like that if you are
careful....
By the way updates are WAY faster than they used to be (LOL)!!! But
updates in most databases are still logged and they are significantly
slower than the unlogged reads........at least in server 2000 the
volume of the log vastly decreased from 7.
By the way, if you are doing such a large amount of updating be sure to
check what is happening with logs....I am told you can no longer just
truncate a log on 2000 and that you must actually perform a database
backup even if to a null device before the truncate.....
Typicall, in databases like server 2000 processing larger volumes
anything you can do to reduce updates/inserts/deletes in favour of
loading is a 'good thing'. I've even written tools that take a file
that will be applied to a database and delete the rows that exist as
well as building a Load Image Format so that at the end the rows being
updated are deleted and loaded......you can process a LOT more rows
like that than you can via updates.....
Best Regards
Peter
www.peternolan.com
|
|
|
|
|