Drop Table
Support Forum for database administrators and web based access to important newsgroups related to databasesHi,
My company has a scenario where we would like to change the data type
of an existing primary key from an integer to a char, but we are
concerned about the performance implications of doing so. The script
for the two tables that we need to modify is listed below. Table
FR_Sessions contains a column named TransmissionID which is currently
an integer. This table contains about 1 million rows of data. Table
FR_VTracking table also contains the TransmissionID as part of it's
primary key and it contains about 35 millions rows of data. These two
tables are frequently joined on TransmissionID (FR_Sessions is the
parent). The TransmissionID column is used primarily for joins and is
not typically displayed.
We need like to change the TransmissionID data type from int to
char(7), and I had a few questions:
1) Would this introduce significant performance degradation? I have
read that char keys/indexes are slower than int/numeric.
2) Are there collation options (or any other optimizations) that we
could use to minimize the performance hit of the char(7)...if so which
ones?
I am a software architect by trade, not a database guru, so please go
easy on my if I overlooked something obvious :)
Any suggestions or information would be greatly appreciated.
Thanks,
Tim
-------------------
CREATE TABLE [FR_Sessions] (
[TransmissionID] [int] IDENTITY (1, 1) NOT NULL ,
[PTUID] [varchar] (10) COLLATE SQL_Latin1_General_C
P1_CI_AS NOT NULL ,
[PortNum] [numeric](6, 0) NOT NULL CONSTRAINT [DF_FR_Sessions_Port
Num]
DEFAULT (0),
[CloseStatus] [varchar] (20) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL
,
[RecvBytes] [int] NULL ,
[SendBytes] [int] NULL ,
[EndDT] [datetime] NULL CONSTRAINT [DF_FR_Sessions_EndD
T] DEFAULT
(getutcdate()),
[LocalEndDT] [datetime] NULL ,
[TotalTime] [int] NULL ,
[OffenderID] [numeric](9, 0) NULL ,
[UploadStatus] [char] (1) COLLATE SQL_Latin1_General_C
P1_CI_AS NOT
NULL CONSTRAINT [DF_FR_Sessions_Uplo
adStatus] DEFAULT ('N'),
[SchedBatchID] [numeric](18, 0) NULL ,
[SWVersion] [varchar] (10) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
[DLST] [bit] NULL ,
[TZO] [smallint] NULL ,
[Processed] [bit] NOT NULL CONSTRAINT [DF_FR_Sessions_Proc
essed]
DEFAULT (0),
[CallerID] [varchar] (13) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
[PeerIP] [varchar] (16) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
[XtraInfo] [varchar] (1024) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL
,
[IdType] [char] (1) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
CONSTRAINT [PK_FR_Sessions] PRIMARY KEY CLUSTERED
(
[TransmissionID]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [FR_VTracking] (
[TransmissionID] [int] NOT NULL ,
[FrameNum] [int] NOT NULL ,
[LatDegrees] [float] NOT NULL ,
[LonDegrees] [float] NOT NULL ,
[Altitude] [float] NOT NULL ,
[Velocity] [float] NOT NULL ,
[NumPositions] [smallint] NOT NULL ,
[NavMode] [smallint] NOT NULL ,
[Units] [smallint] NOT NULL ,
[GPSTrackingID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[dtStamp] [datetime] NULL ,
CONSTRAINT [PK_FR_VTracking] PRIMARY KEY CLUSTERED
(
[TransmissionID],
[FrameNum]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]
Post Follow-up to this message"twkelsey" <twkelsey@hotmail.com> wrote in message news:1114448085.321101.204760@z14g2000cwz.googlegroups.com... > Hi, > > My company has a scenario where we would like to change the data type > of an existing primary key from an integer to a char, but we are > concerned about the performance implications of doing so. The script > for the two tables that we need to modify is listed below. Table > FR_Sessions contains a column named TransmissionID which is currently > an integer. This table contains about 1 million rows of data. Table > FR_VTracking table also contains the TransmissionID as part of it's > primary key and it contains about 35 millions rows of data. These two > tables are frequently joined on TransmissionID (FR_Sessions is the > parent). The TransmissionID column is used primarily for joins and is > not typically displayed. > > We need like to change the TransmissionID data type from int to > char(7), and I had a few questions: > > 1) Would this introduce significant performance degradation? I have > read that char keys/indexes are slower than int/numeric. > 2) Are there collation options (or any other optimizations) that we > could use to minimize the performance hit of the char(7)...if so which > ones? > > I am a software architect by trade, not a database guru, so please go > easy on my if I overlooked something obvious :) > > Any suggestions or information would be greatly appreciated. > > Thanks, > Tim > <snip> From a performance point of view, the main difference is that an int takes up 4 bytes, but char(7) would take 7. That means more data must be stored, retrieved and moved around as MSSQL executes queries and modifies data. Having said that, char(7) is not an especially wide column, and as long as you have appropriate indexes, and queries run within acceptable times for your users and application needs, then there's nothing to worry about. A more important question is probably why you need to do this - if TransmissionID is an internal key which isn't exposed to users, then why would you want or need to change it? The fact that you're using IDENTITY suggests that the values have no inherent meaning as such, so I'm not sure what the benefit of char(7) is, unless perhaps you're moving from an artificial to a natural key? Simon
Post Follow-up to this messageSimon, Thanks for your reply. In regard to your question about why we would like to do this: We have wireless devices the field that transmit realtime data over TCP/IP. A Windows service grabs this data off of the wire and inserts it into the database. Currently, an initial "empty" row is inserted in the FR_Session table to get the TransmissionID back from the identity column. When the wireless session has completed, a subsequent update is performed to fill in the session summary information. We are optimizning the current process and my design requirements dictate that I must not only avoid the subsequent updates, but also to completely decouple the service from accessing the database. To that end, the windows service is being re-designed to create BCP files which will be bulk copied at fixed intervals into the database. This implies that the TransmissionID will have to be generated by the service itself and inserted into the BCP files for two tables (instead of being generated by identity). The new TransmissionID that is generated by the service will be alphanumeric, so if we could push that ID straight in with the BCP files for both the parent and child tables it would be preferred. Thanks, Tim
Post Follow-up to this messageTo answer the other question: yes, collation makes a small difference.
To minimize impact of collation, use any of the binary collations.
By the way: I would consider it a serious mistake to redesign your
database to match a wire format. You have a database to store data, not
to simulate some external format. You use SELECT statements to retrieve
the data. If needed, you can converting int to char(7) in such a query
or in a view. Maybe that is a simple solution for your problem.
Also note, that Identity only works on integer and decimal data types,
not on char.
HTH,
Gert-Jan
twkelsey wrote:
>
> Hi,
>
> My company has a scenario where we would like to change the data type
> of an existing primary key from an integer to a char, but we are
> concerned about the performance implications of doing so. The script
> for the two tables that we need to modify is listed below. Table
> FR_Sessions contains a column named TransmissionID which is currently
> an integer. This table contains about 1 million rows of data. Table
> FR_VTracking table also contains the TransmissionID as part of it's
> primary key and it contains about 35 millions rows of data. These two
> tables are frequently joined on TransmissionID (FR_Sessions is the
> parent). The TransmissionID column is used primarily for joins and is
> not typically displayed.
>
> We need like to change the TransmissionID data type from int to
> char(7), and I had a few questions:
>
> 1) Would this introduce significant performance degradation? I have
> read that char keys/indexes are slower than int/numeric.
> 2) Are there collation options (or any other optimizations) that we
> could use to minimize the performance hit of the char(7)...if so which
> ones?
>
> I am a software architect by trade, not a database guru, so please go
> easy on my if I overlooked something obvious :)
>
> Any suggestions or information would be greatly appreciated.
>
> Thanks,
> Tim
>
> -------------------
>
> CREATE TABLE [FR_Sessions] (
> [TransmissionID] [int] IDENTITY (1, 1) NOT NULL ,
> [PTUID] [varchar] (10) COLLATE SQL_Latin1_General_C
P1_CI_AS NOT NULL ,
> [PortNum] [numeric](6, 0) NOT NULL CONSTRAINT [DF_FR_Sessions_Port
Num]
> DEFAULT (0),
> [CloseStatus] [varchar] (20) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL
> ,
> [RecvBytes] [int] NULL ,
> [SendBytes] [int] NULL ,
> [EndDT] [datetime] NULL CONSTRAINT [DF_FR_Sessions_EndD
T] DEFAULT
> (getutcdate()),
> [LocalEndDT] [datetime] NULL ,
> [TotalTime] [int] NULL ,
> [OffenderID] [numeric](9, 0) NULL ,
> [UploadStatus] [char] (1) COLLATE SQL_Latin1_General_C
P1_CI_AS NOT
> NULL CONSTRAINT [DF_FR_Sessions_Uplo
adStatus] DEFAULT ('N'),
> [SchedBatchID] [numeric](18, 0) NULL ,
> [SWVersion] [varchar] (10) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
> [DLST] [bit] NULL ,
> [TZO] [smallint] NULL ,
> [Processed] [bit] NOT NULL CONSTRAINT [DF_FR_Sessions_Proc
essed]
> DEFAULT (0),
> [CallerID] [varchar] (13) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
> [PeerIP] [varchar] (16) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
> [XtraInfo] [varchar] (1024) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL
> ,
> [IdType] [char] (1) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
> CONSTRAINT [PK_FR_Sessions] PRIMARY KEY CLUSTERED
> (
> [TransmissionID]
> ) WITH FILLFACTOR = 90 ON [PRIMARY]
> ) ON [PRIMARY]
>
> CREATE TABLE [FR_VTracking] (
> [TransmissionID] [int] NOT NULL ,
> [FrameNum] [int] NOT NULL ,
> [LatDegrees] [float] NOT NULL ,
> [LonDegrees] [float] NOT NULL ,
> [Altitude] [float] NOT NULL ,
> [Velocity] [float] NOT NULL ,
> [NumPositions] [smallint] NOT NULL ,
> [NavMode] [smallint] NOT NULL ,
> [Units] [smallint] NOT NULL ,
> [GPSTrackingID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
> [dtStamp] [datetime] NULL ,
> CONSTRAINT [PK_FR_VTracking] PRIMARY KEY CLUSTERED
> (
> [TransmissionID],
> [FrameNum]
> ) WITH FILLFACTOR = 90 ON [PRIMARY]
> ) ON [PRIMARY]
Post Follow-up to this messageGert-Jan Strik (sorry@toomuchspamal ready.nl) writes: > To answer the other question: yes, collation makes a small difference. > To minimize impact of collation, use any of the binary collations. I seem to recall numbers like 20% overhead for a case-sensitive collation and 35% for an accent- and case-ínsensitive collation. But this was very long ago, probably during 6.5 days, and may not be relevant. I would recommend that you use the default collation of the database, and if you have business requirements for, say, case-insensitive go with that. The performance gain you could make by making these particular columns using a binary collation may not exceed the cost for the increased complexity. An alternative would be to se binary values altogether, and avoid the char problem. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techin.../2000/books.asp
Post Follow-up to this messageTim,
Yes, there will some performance overhead to use char(7) as index key rather
than integer since string comparison normally is more expensive than integer
comparison. Also since integer takes 4 bytes while char(70 takes 7 bytes,
you should expect to see a small space increase with this change. The
performance overhead caused by copying the extra 3 bytes around in the
server normally is too small to notice.
Collation definitely makes a different when it comes to string comparison.
Try to use binary collation which normally compares fastest for string.
--
Gang He
Software Design Engineer
Microsoft SQL Server Storage Engine
This posting is provided "AS IS" with no warranties, and confers no rights.
"twkelsey" <twkelsey@hotmail.com> wrote in message
news:1114448085.321101.204760@z14g2000cwz.googlegroups.com...
> Hi,
>
> My company has a scenario where we would like to change the data type
> of an existing primary key from an integer to a char, but we are
> concerned about the performance implications of doing so. The script
> for the two tables that we need to modify is listed below. Table
> FR_Sessions contains a column named TransmissionID which is currently
> an integer. This table contains about 1 million rows of data. Table
> FR_VTracking table also contains the TransmissionID as part of it's
> primary key and it contains about 35 millions rows of data. These two
> tables are frequently joined on TransmissionID (FR_Sessions is the
> parent). The TransmissionID column is used primarily for joins and is
> not typically displayed.
>
> We need like to change the TransmissionID data type from int to
> char(7), and I had a few questions:
>
> 1) Would this introduce significant performance degradation? I have
> read that char keys/indexes are slower than int/numeric.
> 2) Are there collation options (or any other optimizations) that we
> could use to minimize the performance hit of the char(7)...if so which
> ones?
>
> I am a software architect by trade, not a database guru, so please go
> easy on my if I overlooked something obvious :)
>
> Any suggestions or information would be greatly appreciated.
>
> Thanks,
> Tim
>
> -------------------
>
> CREATE TABLE [FR_Sessions] (
> [TransmissionID] [int] IDENTITY (1, 1) NOT NULL ,
> [PTUID] [varchar] (10) COLLATE SQL_Latin1_General_C
P1_CI_AS NOT NULL ,
> [PortNum] [numeric](6, 0) NOT NULL CONSTRAINT [DF_FR_Sessions_Port
Num]
> DEFAULT (0),
> [CloseStatus] [varchar] (20) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL
> ,
> [RecvBytes] [int] NULL ,
> [SendBytes] [int] NULL ,
> [EndDT] [datetime] NULL CONSTRAINT [DF_FR_Sessions_EndD
T] DEFAULT
> (getutcdate()),
> [LocalEndDT] [datetime] NULL ,
> [TotalTime] [int] NULL ,
> [OffenderID] [numeric](9, 0) NULL ,
> [UploadStatus] [char] (1) COLLATE SQL_Latin1_General_C
P1_CI_AS NOT
> NULL CONSTRAINT [DF_FR_Sessions_Uplo
adStatus] DEFAULT ('N'),
> [SchedBatchID] [numeric](18, 0) NULL ,
> [SWVersion] [varchar] (10) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
> [DLST] [bit] NULL ,
> [TZO] [smallint] NULL ,
> [Processed] [bit] NOT NULL CONSTRAINT [DF_FR_Sessions_Proc
essed]
> DEFAULT (0),
> [CallerID] [varchar] (13) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
> [PeerIP] [varchar] (16) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
> [XtraInfo] [varchar] (1024) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL
> ,
> [IdType] [char] (1) COLLATE SQL_Latin1_General_C
P1_CI_AS NULL ,
> CONSTRAINT [PK_FR_Sessions] PRIMARY KEY CLUSTERED
> (
> [TransmissionID]
> ) WITH FILLFACTOR = 90 ON [PRIMARY]
> ) ON [PRIMARY]
>
> CREATE TABLE [FR_VTracking] (
> [TransmissionID] [int] NOT NULL ,
> [FrameNum] [int] NOT NULL ,
> [LatDegrees] [float] NOT NULL ,
> [LonDegrees] [float] NOT NULL ,
> [Altitude] [float] NOT NULL ,
> [Velocity] [float] NOT NULL ,
> [NumPositions] [smallint] NOT NULL ,
> [NavMode] [smallint] NOT NULL ,
> [Units] [smallint] NOT NULL ,
> [GPSTrackingID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
> [dtStamp] [datetime] NULL ,
> CONSTRAINT [PK_FR_VTracking] PRIMARY KEY CLUSTERED
> (
> [TransmissionID],
> [FrameNum]
> ) WITH FILLFACTOR = 90 ON [PRIMARY]
> ) ON [PRIMARY]
>
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread