Home > Archive > Microsoft SQL Server forum > October 2005 > Sequence of columns in primary key









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Sequence of columns in primary key
gilles27@talk21.com

2005-10-27, 9:24 am

I am currently undertaking a review of the primary keys in a SQL Server
2000 database with a view to improving performance of queries.

I have heard that, in the case of compound primary keys, it is
important to select the correct order for the columns within the key.
For instance, imagine a table called OrderLine which has primary key
columns as follows-

Ledger
OrderNumber
OrderLineNumber

The theory I have heard is that columns with the most distinct values
should come first. In this case, Ledger is likely to have a maximum of
6 distinct values, OrderNumber a maximum of 10 million and OrderLine up
to 99. Based on this supposition I believe the best order would be-

OrderNumber
OrderLineNumber
Ledger

I have performed a few rudimentary tests that appear to bear this out.
I was wondering if anyone else has tried something similar and if so
what was the result?


Thanks,

Ross

--CELKO--

2005-10-27, 9:24 am

>> columns with the most distinct values should come first. <<

This is true.

But you also have to remember that an index (a,b,c) in effect gives
you indexng for (a,b) and (a). You might also consider a covering
index to speed up queries. A covering index has all the columns needed
to answer a query so that you do not have to read the base table at
all. Also be sure that the FK and PK lists are in the same order.

Alexander Kuznetsov

2005-10-27, 9:24 am

>The theory I have heard is that columns with the most distinct values
>should come first


There are quite a few exceptions. For instanse, if you have a table
ORDERS with PK on (CUSTOMERS_PK, some_other_column) and a FK constraint
referencing table CUSTOMERS, you might want to have CUSTOMERS_PK column
first even if it is less selective than the other column.

There aren't that many hard and fast rules in our trade

Erland Sommarskog

2005-10-27, 9:24 am

(gilles27@talk21.com) writes:
> I have heard that, in the case of compound primary keys, it is
> important to select the correct order for the columns within the key.
> For instance, imagine a table called OrderLine which has primary key
> columns as follows-
>
> Ledger
> OrderNumber
> OrderLineNumber
>
> The theory I have heard is that columns with the most distinct values
> should come first. In this case, Ledger is likely to have a maximum of
> 6 distinct values, OrderNumber a maximum of 10 million and OrderLine up
> to 99. Based on this supposition I believe the best order would be-
>
> OrderNumber
> OrderLineNumber
> Ledger
>
> I have performed a few rudimentary tests that appear to bear this out.
> I was wondering if anyone else has tried something similar and if so
> what was the result?


Nah, the order in a primary key should rather reflect the logical
hierarchy. If an OrderLine can be broken into pieces called ledgers
the above would be the right model. I would guess, though, that different
ledgers have different series for order numbers, and thus the ledger is
the main key here.

I don't know the business rules here, but it seems to be that
SELECT * FROM OrderLine WHERE OrderNumber = 1234 is not even
meaninful, because you may get to see data from six different and
unrelated orders.

Now, not all tables with a compound key has a clear hiearchy. Consider
this table:

CREATE TABLE bankholidays(
coucode aba_coucode NOT NULL,
bhodate aba_date NOT NULL,
reguser aba_upduser NOT NULL,
moddate aba_updtime NOT NULL,
CONSTRAINT pk_bho PRIMARY KEY (coucode, bhodate)
)

This table lists days Mon-Fri that are not business days in various
countries. If you want to know in which countries Christmas Eve is
not a business day, you would say

SELECT coucode FROM bankholidays WHERE bhodate = '20041224'

But if you want to compute the settledate for a given trade date
and instrument, the country code is the top-level key. (And since
we mainly use the table for the latter purpose, this is why coucode
comes first.)

Thus, for such a table, it is very common to have a index on the
reverse order of the key, or some reverse order for a key with three
or more columns.

What you should make special consideration, is which index should be
the clustered index. This is necessarily not the primary key. It
could be a different combination of the PK - or on a completely
different column.

Finally, one thing also to keep in mind is how you use the columns.
If you say things like:

SELECT * FROM tbl WHERE a = 1 AND b BETWEEN 12 AND 19

An index on (a,b) is better, even if a only has 10 possible values and
b has 100.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

gilles27@talk21.com

2005-10-27, 9:24 am

Erland,

You're right to say that Ledger is the "main" key. I should have
explained but it looks like you figured it out yourself. I also
incorrectly stated the current column order in the PK. It is Ledger,
OrderLineNumber, OrderNumber and NOT Ledger, OrderNumber,
OrderLineNumber.

If I use your hierarchy theory the best order is Ledger, OrderNumber,
OrderLineNumber. This also fits in nicely with Joe's statement that "an
index (a, b, c) in effect gives you indexing for (a, b) and (a)",
because a common SELECT or JOIN on this table would involve Ledger and
OrderNumber as this points to the PK in the Orders table.

I am running SQL Profiler against a customer's database today so I hope
to have more information about the queries that our users run. This
will enable me to make a more informed decision about the sequence of
the columns in each PK in the database. It seems to me that it is more
important to understand usage of the database rather than ordering them
by number of distinct values.

Ross

Erland Sommarskog

2005-10-27, 9:24 am

(gilles27@talk21.com) writes:
> You're right to say that Ledger is the "main" key. I should have
> explained but it looks like you figured it out yourself. I also
> incorrectly stated the current column order in the PK. It is Ledger,
> OrderLineNumber, OrderNumber and NOT Ledger, OrderNumber,
> OrderLineNumber.


Assuming that OrderLineNumber refers to a detail line in an order,
having OrderLineNumber before OrderNumber sounds like someone had
a brainmelt. There is all reason to change that key.

(Unless queries like "Show me all line 5 on all orders in one ledger"
are common.)




--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

gilles27@talk21.com

2005-10-27, 9:25 am

Our review of the primary keys was triggered by one of our customers
complaining about the performance of the database. They claimed to have
"re-sequenced the primary key which reduced Index reads by a factor of
5". Do you have any idea as to how they arrived at this conclusion?

Gert-Jan Strik

2005-10-27, 9:25 am

I created a large table (1306677 rows, 151888 KB) with the
characteristics as originally described for Ledger, OrderNumber and
OrderLineNumber. I then tried different primary key column orders (all 6
combinations). After reindexing, each combination resulted in the same
number of pages and same index depth. In other words: there will be no
performance difference for exact primary key matches.

Maybe the table was very fragmented and had much unused space. Changing
the primary key column order would effectively reindex the table, and
thus remove fragmentation and restore the original fill factor. This
could easily decrease the index depth and result in the performance
gain.

Gert-Jan


gilles27@talk21.com wrote:
>
> Our review of the primary keys was triggered by one of our customers
> complaining about the performance of the database. They claimed to have
> "re-sequenced the primary key which reduced Index reads by a factor of
> 5". Do you have any idea as to how they arrived at this conclusion?

Erland Sommarskog

2005-10-27, 9:25 am

(gilles27@talk21.com) writes:
> Our review of the primary keys was triggered by one of our customers
> complaining about the performance of the database. They claimed to have
> "re-sequenced the primary key which reduced Index reads by a factor of
> 5". Do you have any idea as to how they arrived at this conclusion?


Sounds like hogwash to me. But if you are an evil man, please ask them
for a repro that demonstrates this. (For the sake of customer
relations, you might prefer to avoid it though. :-)

Gert-Jan's theory about fragmentation being fixed by reindexing sounds
plausible to me.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

Robert Klemme

2005-10-27, 9:25 am

Erland Sommarskog wrote:
> (gilles27@talk21.com) writes:
>
> Sounds like hogwash to me.


I don't know the table structure and queries but there actually are
scenarios with different queries against the same table where the overall
benefit of reordering columns is significant.

> But if you are an evil man, please ask them
> for a repro that demonstrates this. (For the sake of customer
> relations, you might prefer to avoid it though. :-)


That depends. From my experience customers actually honor showing
interest in their findings. Remember that they experienced a success
because they were able to implement such a dramatic improvement. In the
end both sides benefit which in turn can improve customer satisfaction.

> Gert-Jan's theory about fragmentation being fixed by reindexing sounds
> plausible to me.


Yes, that's another option.

Kind regards

robert

gilles27@talk21.com

2005-10-27, 9:25 am

I'm sure Gert-Jan is correct in what he states, however because the
most likely filtering on OrderLines is by Ledger and OrderNumber, those
two should come before OrderLineNumber in the primary key.

I could ask the customer to demonstrate their findings but my pride
prevents me from admitting to them that I don't know what they mean by
"reduced index reads by a factor of 5". :-)

I need some of way checking that my changes to primary keys are
actually having a positive effect, if I could compare the number of
index reads before and after modifying the table, that would help.


Ross

Robert Klemme

2005-10-27, 9:25 am

gilles27@talk21.com wrote:
> I'm sure Gert-Jan is correct in what he states, however because the
> most likely filtering on OrderLines is by Ledger and OrderNumber,
> those two should come before OrderLineNumber in the primary key.
>
> I could ask the customer to demonstrate their findings but my pride
> prevents me from admitting to them that I don't know what they mean by
> "reduced index reads by a factor of 5". :-)
>
> I need some of way checking that my changes to primary keys are
> actually having a positive effect, if I could compare the number of
> index reads before and after modifying the table, that would help.


I'm afraid, that is not so easy. It starts with the objectives: what do
you (did they) want to optimize? Is it execution time of a single query,
is it execution time of a set of queries? You need to know the usage
patterns. Then there's data: does it change often / seldome? How is it
distributed? Measures: you can not only change order of columns in this
index but you can also add another index etc...

When I try to optimize slow queries I usually start with a look at the
execution plan because that is easily obtained. Often you can see that an
index is missing / not used etc. immediately.

Kind regards

robert


Erland Sommarskog

2005-10-27, 9:25 am

Robert Klemme (bob.news@gmx.net) writes:
> Erland Sommarskog wrote:
>
> I don't know the table structure and queries but there actually are
> scenarios with different queries against the same table where the overall
> benefit of reordering columns is significant.


Of course there is. The main recommendation of this thread is that
the PK of the table in question should be reordered. The "normal"
order is (Ledger, OrderNumber, OrderLineNumber), and this was, I assume,
the original order. What I referred to as hogwash was the claim that
reordering the index so that OrderLineNumber and OrderNumber changed
places reduced index reads by a factor 5.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

Erland Sommarskog

2005-10-27, 9:25 am

(gilles27@talk21.com) writes:
> I'm sure Gert-Jan is correct in what he states, however because the
> most likely filtering on OrderLines is by Ledger and OrderNumber, those
> two should come before OrderLineNumber in the primary key.
>
> I could ask the customer to demonstrate their findings but my pride
> prevents me from admitting to them that I don't know what they mean by
> "reduced index reads by a factor of 5". :-)


You don't? Good, then we are two!

But OK, let's say you have this query:

SELECT * FROM tbl WHERE Ledger = @ledger AND OrderNumber = @nbr

The key reorder certainly gave a drastic relief on the poor index. Rather
than wearing thing on the poor index pages, you get a table scan instead.
Or at least scan for a Ledger.

Anyway, what it's important in the end is wallclock time. The total number
of reads is a good indication, when wallclock time varies due to other
queries, blocking etc. But index reads alone is not of interest.

Ah, there is of course the interesting issue of what happens at INSERT.
Now, with the proper PK, and if this key is clustered, thanks to the Ledger
column, there will be six hot spots in this table, and there will be
plenty of page splits. If you change the order, the picture will be
different, and INSERTs will spread all over. Both will lead to
fragmentation. If you want to avoid this, best is probably to keep the
PK non-clustered and cluster on OrderDate, or even on an IDENTITY column.
Then again, if there are proper maintenance windows for frequent reindexing,
this is not an issue.

> I need some of way checking that my changes to primary keys are
> actually having a positive effect, if I could compare the number of
> index reads before and after modifying the table, that would help.


Rather take typical queries like

SELECT * FROM tbl WHERE Ledger = @ledger AND OrderNumber = @nbr


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

Robert Klemme

2005-10-27, 9:25 am

Erland Sommarskog wrote:
> Robert Klemme (bob.news@gmx.net) writes:
>
> Of course there is. The main recommendation of this thread is that
> the PK of the table in question should be reordered. The "normal"
> order is (Ledger, OrderNumber, OrderLineNumber), and this was, I
> assume, the original order. What I referred to as hogwash was the
> claim that reordering the index so that OrderLineNumber and
> OrderNumber changed places reduced index reads by a factor 5.


Ah, ok. Thanks for clarification!

robert

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com