Home > Archive > MS SQL Data Warehousing > May 2005 > Updating information on a very large table









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Updating information on a very large table
stooky

2005-04-14, 1:23 pm

We have a table with 120 million rows and roughly 300 fields. We update the
phone status of these fields every quarter. The phone update is a file of 90
million phone numbers. Basically, any phone number that matches a record in
our 120mil table triggers a record update (a single char field is changed
from 'N' to 'Y').

Any suggestions on how to speed up this process? Right now the update is
taking over 48hrs.

thanks.
Shawn Meyer

2005-04-14, 1:23 pm

What does your update statement / table definition / update file look like.
Are you loading your update file into a seperate table?
Shawn

"stooky" <stooky@discussions.microsoft.com> wrote in message
news:58F7E734-E6AD-4BBD-BC82- 6203591AB835@microso
ft.com...
> We have a table with 120 million rows and roughly 300 fields. We update

the
> phone status of these fields every quarter. The phone update is a file of

90
> million phone numbers. Basically, any phone number that matches a record

in
> our 120mil table triggers a record update (a single char field is changed
> from 'N' to 'Y').
>
> Any suggestions on how to speed up this process? Right now the update is
> taking over 48hrs.
>
> thanks.



stooky

2005-04-14, 8:23 pm

The update information is stored in a simple table:

CREATE TABLE [do_not_call] (
[id] [int] NOT NULL ,
[phone] [char] (10) COLLATE SQL_Latin1_General_C
P1_CI_AS NOT NULL
) ON [DO_NOT_MAIL]
GO

The data for this table is BULK INSERTED and then indexed with:

CREATE UNIQUE CLUSTERED INDEX [idx-id] ON [dbo].[do_not_call]([id]) WITH
IGNORE_DUP_KEY ON [DO_NOT_CALL1]
GO

CREATE INDEX [idx-phone] ON [dbo]. [do_not_call]([phone
]) ON
[IDX-DO_NOT_CALL2]
GO

The main table has a PHONE field that is indexed and a PHONE_PANDER_FLAG
field that is not indexed (until after the update).

The update query looks like this:

UPDATE consumer SET phone_pander_flag='Y
'
WHERE phone IN (SELECT phone FROM do_not_call
WHERE id > 0 AND id < 500000 AND phone != ' ');
GO

**NOTE: This small query is repeated in chunks of 500k from 0 to 90mil. I
have tried other sizes and 500k chunks seem to perform the best.

The main table "consumer" is setup across multiple file groups and disks and
runs very fast.

thanks.

"Shawn Meyer" wrote:

> What does your update statement / table definition / update file look like.
> Are you loading your update file into a seperate table?
> Shawn
>
> "stooky" <stooky@discussions.microsoft.com> wrote in message
> news:58F7E734-E6AD-4BBD-BC82- 6203591AB835@microso
ft.com...
> the
> 90
> in
>
>
>

Shawn Meyer

2005-04-18, 1:23 pm

I havent tested this, but you might want to try a join, since both of the
tables are indexed.

UPDATE consumer SET phone_pander_flag='Y
'
FROM
consumer c
INNER JOIN
[do_not_call] dnc
ON c.phone = dnc.phone

Shawn

"stooky" <stooky@discussions.microsoft.com> wrote in message
news:FF4D39F9-13C4-495B-83C2- 33D92C39F9B4@microso
ft.com...
> The update information is stored in a simple table:
>
> CREATE TABLE [do_not_call] (
> [id] [int] NOT NULL ,
> [phone] [char] (10) COLLATE SQL_Latin1_General_C
P1_CI_AS NOT NULL
> ) ON [DO_NOT_MAIL]
> GO
>
> The data for this table is BULK INSERTED and then indexed with:
>
> CREATE UNIQUE CLUSTERED INDEX [idx-id] ON [dbo]. [do_not_call]([id])[
/color]
WITH[color=darkred]
> IGNORE_DUP_KEY ON [DO_NOT_CALL1]
> GO
>
> CREATE INDEX [idx-phone] ON [dbo]. [do_not_call]([phone
]) ON
> [IDX-DO_NOT_CALL2]
> GO
>
> The main table has a PHONE field that is indexed and a PHONE_PANDER_FLAG
> field that is not indexed (until after the update).
>
> The update query looks like this:
>
> UPDATE consumer SET phone_pander_flag='Y
'
> WHERE phone IN (SELECT phone FROM do_not_call
> WHERE id > 0 AND id < 500000 AND phone != ' ');
> GO
>
> **NOTE: This small query is repeated in chunks of 500k from 0 to 90mil. I
> have tried other sizes and 500k chunks seem to perform the best.
>
> The main table "consumer" is setup across multiple file groups and disks

and[color=darkred]
> runs very fast.
>
> thanks.
>
> "Shawn Meyer" wrote:
>
like.[color=darkred]
update[color=darkred
]
of[color=darkred]
record[color=darkred
]
changed[color=darkre
d]
is[color=darkred]


Peter Nolan

2005-05-13, 1:23 pm

Hi Stooky,
this is a good example of not making sure that updates are only
performed if they need to be performed...the best way to avoid problems
like this is to only ever update a row when it needs to be
updated.......I am sure that there are not 90M changes per
quarter..;-).....

It sounds like the the rows get set to no and then set back to 'Y' if
there is a match....a better way to design it would be leave the flag
as 'Y' from the previous quarter and then detect records where the
status of the phone has changed and then change the rows that have
actually changed...

You will also be logging these 90M changes in SQL Server 2000 unless
you know something I don't about turnning off logging......

In general, it's worth spending time figuring out how to minimise the
number of updates if the database is sizable....

Best Regards

Peter Nolan
www.peternolan.com

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2009 droptable.com