Drop Table

Support Forum for database administrators and web based access to important newsgroups related to databases
Register on Database Support Forum Edit your profileCalendarFind other Database Support forum membersFrequently Asked QuestionsSearch this forum -> 
For Database admins: Free Database-related Magazines Now Free shipping to Texas


Post New Thread










Thread
Author

Sequence of columns in primary key
I am currently undertaking a review of the primary keys in a SQL Server
2000 database with a view to improving performance of queries.

I have heard that, in the case of compound primary keys, it is
important to select the correct order for the columns within the key.
For instance, imagine a table called OrderLine which has primary key
columns as follows-

Ledger
OrderNumber
OrderLineNumber

The theory I have heard is that columns with the most distinct values
should come first. In this case, Ledger is likely to have a maximum of
6 distinct values, OrderNumber a maximum of 10 million and OrderLine up
to 99. Based on this supposition I believe the best order would be-

OrderNumber
OrderLineNumber
Ledger

I have performed a few rudimentary tests that appear to bear this out.
I was wondering if anyone else has tried something similar and if so
what was the result?


Thanks,

Ross


Report this thread to moderator Post Follow-up to this message
Old Post
gilles27@talk21.com
10-27-05 02:24 PM


Re: Sequence of columns in primary key
>> columns with the most distinct values should come first. <<

This is true.

But you also have to remember that an index (a,b,c)  in effect gives
you indexng for (a,b) and (a).  You might also consider a covering
index to speed up queries.  A covering index has all the columns needed
to answer a query so that you do not have to read the base table at
all.  Also be sure that the FK and PK lists are in the same order.


Report this thread to moderator Post Follow-up to this message
Old Post
--CELKO--
10-27-05 02:24 PM


Re: Sequence of columns in primary key
>The theory I have heard is that columns with the most distinct  values
>should come first

There are quite a few exceptions. For instanse, if you have a table
ORDERS with PK on (CUSTOMERS_PK, some_other_column) and a FK constraint
referencing table CUSTOMERS, you might want to have CUSTOMERS_PK column
first even if it is less selective than the other column.

There aren't that many hard and fast rules in our trade


Report this thread to moderator Post Follow-up to this message
Old Post
Alexander Kuznetsov
10-27-05 02:24 PM


Re: Sequence of columns in primary key
(gilles27@talk21.com)  writes:
> I have heard that, in the case of compound primary keys, it is
> important to select the correct order for the columns within the key.
> For instance, imagine a table called OrderLine which has primary key
> columns as follows-
>
> Ledger
> OrderNumber
> OrderLineNumber
>
> The theory I have heard is that columns with the most distinct values
> should come first. In this case, Ledger is likely to have a maximum of
> 6 distinct values, OrderNumber a maximum of 10 million and OrderLine up
> to 99. Based on this supposition I believe the best order would be-
>
> OrderNumber
> OrderLineNumber
> Ledger
>
> I have performed a few rudimentary tests that appear to bear this out.
> I was wondering if anyone else has tried something similar and if so
> what was the result?

Nah, the order in a primary key should rather reflect the logical
hierarchy. If an OrderLine can be broken into pieces called ledgers
the above would be the right model. I would guess, though, that different
ledgers have different series for order numbers, and thus the ledger is
the main key here.

I don't know the business rules here, but it seems to be that
SELECT * FROM OrderLine WHERE OrderNumber = 1234 is not even
meaninful, because you may get to see data from six different and
unrelated orders.

Now, not all tables with a compound key has a clear hiearchy. Consider
this table:

CREATE TABLE bankholidays(
coucode   aba_coucode  NOT NULL,
bhodate   aba_date     NOT NULL,
reguser   aba_upduser  NOT NULL,
moddate   aba_updtime  NOT NULL,
CONSTRAINT pk_bho PRIMARY KEY (coucode, bhodate)
)

This table lists days Mon-Fri that are not business days in various
countries. If you want to know in which countries Christmas Eve is
not a business day, you would say

SELECT coucode FROM bankholidays WHERE bhodate = '20041224'

But if you want to compute the settledate for a given trade date
and instrument, the country code is the top-level key. (And since
we mainly use the table for the latter purpose, this is why coucode
comes first.)

Thus, for such a table, it is very common to have a index on the
reverse order of the key, or some reverse order for a key with three
or more columns.

What you should make special consideration, is which index should be
the clustered index. This is necessarily not the primary key. It
could be a different combination of the PK - or on a completely
different column.

Finally, one thing also to keep in mind is how you use the columns.
If you say things like:

SELECT * FROM tbl WHERE a = 1 AND b BETWEEN 12 AND 19

An index on (a,b) is better, even if a only has 10 possible values and
b has 100.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp


Report this thread to moderator Post Follow-up to this message
Old Post
Erland Sommarskog
10-27-05 02:24 PM


Re: Sequence of columns in primary key
Erland,

You're right to say that Ledger is the "main" key. I should have
explained but it looks like you figured it out yourself. I also
incorrectly stated the current column order in the PK. It is Ledger,
OrderLineNumber, OrderNumber and NOT Ledger, OrderNumber,
OrderLineNumber.

If I use your hierarchy theory the best order is Ledger, OrderNumber,
OrderLineNumber. This also fits in nicely with Joe's statement that "an
index (a, b, c) in effect gives you indexing for (a, b) and (a)",
because a common SELECT or JOIN on this table would involve Ledger and
OrderNumber as this points to the PK in the Orders table.

I am running SQL Profiler against a customer's database today so I hope
to have more information about the queries that our users run. This
will enable me to make a more informed decision about the sequence of
the columns in each PK in the database. It seems to me that it is more
important to understand usage of the database rather than ordering them
by number of distinct values.

Ross


Report this thread to moderator Post Follow-up to this message
Old Post
gilles27@talk21.com
10-27-05 02:24 PM


Re: Sequence of columns in primary key
(gilles27@talk21.com)  writes:
> You're right to say that Ledger is the "main" key. I should have
> explained but it looks like you figured it out yourself. I also
> incorrectly stated the current column order in the PK. It is Ledger,
> OrderLineNumber, OrderNumber and NOT Ledger, OrderNumber,
> OrderLineNumber.

Assuming that OrderLineNumber refers to a detail line in an order,
having OrderLineNumber before OrderNumber sounds like someone had
a brainmelt. There is all reason to change that key.

(Unless queries like "Show me all line 5 on all orders in one ledger"
are common.)




--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp


Report this thread to moderator Post Follow-up to this message
Old Post
Erland Sommarskog
10-27-05 02:24 PM


Re: Sequence of columns in primary key
Our review of the primary keys was triggered by one of our customers
complaining about the performance of the database. They claimed to have
"re-sequenced the primary key which reduced Index reads by a factor of
5". Do you have any idea as to how they arrived at this conclusion?


Report this thread to moderator Post Follow-up to this message
Old Post
gilles27@talk21.com
10-27-05 02:25 PM


Re: Sequence of columns in primary key
I created a large table (1306677 rows, 151888 KB) with the
characteristics as originally described for Ledger, OrderNumber and
OrderLineNumber. I then tried different primary key column orders (all 6
combinations). After reindexing, each combination resulted in the same
number of pages and same index depth. In other words: there will be no
performance difference for exact primary key matches.

Maybe the table was very fragmented and had much unused space. Changing
the primary key column order would effectively reindex the table, and
thus remove fragmentation and restore the original fill factor. This
could easily decrease the index depth and result in the performance
gain.

Gert-Jan


gilles27@talk21.com  wrote:
>
> Our review of the primary keys was triggered by one of our customers
> complaining about the performance of the database. They claimed to have
> "re-sequenced the primary key which reduced Index reads by a factor of
> 5". Do you have any idea as to how they arrived at this conclusion?

Report this thread to moderator Post Follow-up to this message
Old Post
Gert-Jan Strik
10-27-05 02:25 PM


Re: Sequence of columns in primary key
(gilles27@talk21.com)  writes:
> Our review of the primary keys was triggered by one of our customers
> complaining about the performance of the database. They claimed to have
> "re-sequenced the primary key which reduced Index reads by a factor of
> 5". Do you have any idea as to how they arrived at this conclusion?

Sounds like hogwash to me. But if you are an evil man, please ask them
for a repro that demonstrates this. (For the sake of customer
relations, you might prefer to avoid it though. :-)

Gert-Jan's theory about fragmentation being fixed by reindexing sounds
plausible to me.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp


Report this thread to moderator Post Follow-up to this message
Old Post
Erland Sommarskog
10-27-05 02:25 PM


Re: Sequence of columns in primary key
Erland Sommarskog  wrote:
> (gilles27@talk21.com) writes: 
>
> Sounds like hogwash to me.

I don't know the table structure and queries but there actually are
scenarios with different queries against the same table where the overall
benefit of reordering columns is significant.

> But if you are an evil man, please ask them
> for a repro that demonstrates this. (For the sake of customer
> relations, you might prefer to avoid it though. :-)

That depends.  From my experience customers actually honor showing
interest in their findings.  Remember that they experienced a success
because they were able to implement such a dramatic improvement.  In the
end both sides benefit which in turn can improve customer satisfaction.

> Gert-Jan's theory about fragmentation being fixed by reindexing sounds
> plausible to me.

Yes, that's another option.

Kind regards

robert


Report this thread to moderator Post Follow-up to this message
Old Post
Robert Klemme
10-27-05 02:25 PM


Sponsored Links





Last Thread Next Thread
Pages (2): [1] 2 »
Post New Thread

Microsoft SQL Server forum archive

Show a Printable Version Email This Page to Someone! Receive updates to this thread
Microsoft SQL Server
Access database support
PostgreSQL Replication
SQL Server ODBC
FoxPro Support
PostgreSQL pgAdmin
SQL Server Clustering
MySQL ODBC
Web Applications with dBASE
SQL Server CE
MySQL++
Sybase Database Support
MS SQL Full Text Search
PostgreSQL Administration
SQL Anywhere support
DB2 UDB Database
Paradox Database Support
Filemaker Database
Berkley DB
SQL 2000/2000i database
ASE Database
Forum Jump:
All times are GMT. The time now is 03:18 PM.

 
Mobile devices forum | Database support forum archive




Copyrights DropTable.com Database Support Forum 2004 - 2006