| Author |
Updating information on a very large table
|
|
| stooky 2005-04-14, 1:23 pm |
| We have a table with 120 million rows and roughly 300 fields. We update the
phone status of these fields every quarter. The phone update is a file of 90
million phone numbers. Basically, any phone number that matches a record in
our 120mil table triggers a record update (a single char field is changed
from 'N' to 'Y').
Any suggestions on how to speed up this process? Right now the update is
taking over 48hrs.
thanks.
| |
| Shawn Meyer 2005-04-14, 1:23 pm |
| What does your update statement / table definition / update file look like.
Are you loading your update file into a seperate table?
Shawn
"stooky" <stooky@discussions.microsoft.com> wrote in message
news:58F7E734-E6AD-4BBD-BC82- 6203591AB835@microso
ft.com...
> We have a table with 120 million rows and roughly 300 fields. We update
the
> phone status of these fields every quarter. The phone update is a file of
90
> million phone numbers. Basically, any phone number that matches a record
in
> our 120mil table triggers a record update (a single char field is changed
> from 'N' to 'Y').
>
> Any suggestions on how to speed up this process? Right now the update is
> taking over 48hrs.
>
> thanks.
| |
| stooky 2005-04-14, 8:23 pm |
| The update information is stored in a simple table:
CREATE TABLE [do_not_call] (
[id] [int] NOT NULL ,
[phone] [char] (10) COLLATE SQL_Latin1_General_C
P1_CI_AS NOT NULL
) ON [DO_NOT_MAIL]
GO
The data for this table is BULK INSERTED and then indexed with:
CREATE UNIQUE CLUSTERED INDEX [idx-id] ON [dbo].[do_not_call]([id]) WITH
IGNORE_DUP_KEY ON [DO_NOT_CALL1]
GO
CREATE INDEX [idx-phone] ON [dbo]. [do_not_call]([phone
]) ON
[IDX-DO_NOT_CALL2]
GO
The main table has a PHONE field that is indexed and a PHONE_PANDER_FLAG
field that is not indexed (until after the update).
The update query looks like this:
UPDATE consumer SET phone_pander_flag='Y
'
WHERE phone IN (SELECT phone FROM do_not_call
WHERE id > 0 AND id < 500000 AND phone != ' ');
GO
**NOTE: This small query is repeated in chunks of 500k from 0 to 90mil. I
have tried other sizes and 500k chunks seem to perform the best.
The main table "consumer" is setup across multiple file groups and disks and
runs very fast.
thanks.
"Shawn Meyer" wrote:
> What does your update statement / table definition / update file look like.
> Are you loading your update file into a seperate table?
> Shawn
>
> "stooky" <stooky@discussions.microsoft.com> wrote in message
> news:58F7E734-E6AD-4BBD-BC82- 6203591AB835@microso
ft.com...
> the
> 90
> in
>
>
>
| |
| Shawn Meyer 2005-04-18, 1:23 pm |
| I havent tested this, but you might want to try a join, since both of the
tables are indexed.
UPDATE consumer SET phone_pander_flag='Y
'
FROM
consumer c
INNER JOIN
[do_not_call] dnc
ON c.phone = dnc.phone
Shawn
"stooky" <stooky@discussions.microsoft.com> wrote in message
news:FF4D39F9-13C4-495B-83C2- 33D92C39F9B4@microso
ft.com...
> The update information is stored in a simple table:
>
> CREATE TABLE [do_not_call] (
> [id] [int] NOT NULL ,
> [phone] [char] (10) COLLATE SQL_Latin1_General_C
P1_CI_AS NOT NULL
> ) ON [DO_NOT_MAIL]
> GO
>
> The data for this table is BULK INSERTED and then indexed with:
>
> CREATE UNIQUE CLUSTERED INDEX [idx-id] ON [dbo]. [do_not_call]([id])[
/color]
WITH[color=darkred]
> IGNORE_DUP_KEY ON [DO_NOT_CALL1]
> GO
>
> CREATE INDEX [idx-phone] ON [dbo]. [do_not_call]([phone
]) ON
> [IDX-DO_NOT_CALL2]
> GO
>
> The main table has a PHONE field that is indexed and a PHONE_PANDER_FLAG
> field that is not indexed (until after the update).
>
> The update query looks like this:
>
> UPDATE consumer SET phone_pander_flag='Y
'
> WHERE phone IN (SELECT phone FROM do_not_call
> WHERE id > 0 AND id < 500000 AND phone != ' ');
> GO
>
> **NOTE: This small query is repeated in chunks of 500k from 0 to 90mil. I
> have tried other sizes and 500k chunks seem to perform the best.
>
> The main table "consumer" is setup across multiple file groups and disks
and[color=darkred]
> runs very fast.
>
> thanks.
>
> "Shawn Meyer" wrote:
>
like.[color=darkred]
update[color=darkred
]
of[color=darkred]
record[color=darkred
]
changed[color=darkre
d]
is[color=darkred]
| |
| Peter Nolan 2005-05-13, 1:23 pm |
| Hi Stooky,
this is a good example of not making sure that updates are only
performed if they need to be performed...the best way to avoid problems
like this is to only ever update a row when it needs to be
updated.......I am sure that there are not 90M changes per
quarter..;-).....
It sounds like the the rows get set to no and then set back to 'Y' if
there is a match....a better way to design it would be leave the flag
as 'Y' from the previous quarter and then detect records where the
status of the phone has changed and then change the rows that have
actually changed...
You will also be logging these 90M changes in SQL Server 2000 unless
you know something I don't about turnning off logging......
In general, it's worth spending time figuring out how to minimise the
number of updates if the database is sizable....
Best Regards
Peter Nolan
www.peternolan.com
|
|
|
|