Drop Table
Support Forum for database administrators and web based access to important newsgroups related to databasesI am currently undertaking a review of the primary keys in a SQL Server 2000 database with a view to improving performance of queries. I have heard that, in the case of compound primary keys, it is important to select the correct order for the columns within the key. For instance, imagine a table called OrderLine which has primary key columns as follows- Ledger OrderNumber OrderLineNumber The theory I have heard is that columns with the most distinct values should come first. In this case, Ledger is likely to have a maximum of 6 distinct values, OrderNumber a maximum of 10 million and OrderLine up to 99. Based on this supposition I believe the best order would be- OrderNumber OrderLineNumber Ledger I have performed a few rudimentary tests that appear to bear this out. I was wondering if anyone else has tried something similar and if so what was the result? Thanks, Ross
Post Follow-up to this message>> columns with the most distinct values should come first. << This is true. But you also have to remember that an index (a,b,c) in effect gives you indexng for (a,b) and (a). You might also consider a covering index to speed up queries. A covering index has all the columns needed to answer a query so that you do not have to read the base table at all. Also be sure that the FK and PK lists are in the same order.
Post Follow-up to this message>The theory I have heard is that columns with the most distinct values >should come first There are quite a few exceptions. For instanse, if you have a table ORDERS with PK on (CUSTOMERS_PK, some_other_column) and a FK constraint referencing table CUSTOMERS, you might want to have CUSTOMERS_PK column first even if it is less selective than the other column. There aren't that many hard and fast rules in our trade
Post Follow-up to this message(gilles27@talk21.com) writes: > I have heard that, in the case of compound primary keys, it is > important to select the correct order for the columns within the key. > For instance, imagine a table called OrderLine which has primary key > columns as follows- > > Ledger > OrderNumber > OrderLineNumber > > The theory I have heard is that columns with the most distinct values > should come first. In this case, Ledger is likely to have a maximum of > 6 distinct values, OrderNumber a maximum of 10 million and OrderLine up > to 99. Based on this supposition I believe the best order would be- > > OrderNumber > OrderLineNumber > Ledger > > I have performed a few rudimentary tests that appear to bear this out. > I was wondering if anyone else has tried something similar and if so > what was the result? Nah, the order in a primary key should rather reflect the logical hierarchy. If an OrderLine can be broken into pieces called ledgers the above would be the right model. I would guess, though, that different ledgers have different series for order numbers, and thus the ledger is the main key here. I don't know the business rules here, but it seems to be that SELECT * FROM OrderLine WHERE OrderNumber = 1234 is not even meaninful, because you may get to see data from six different and unrelated orders. Now, not all tables with a compound key has a clear hiearchy. Consider this table: CREATE TABLE bankholidays( coucode aba_coucode NOT NULL, bhodate aba_date NOT NULL, reguser aba_upduser NOT NULL, moddate aba_updtime NOT NULL, CONSTRAINT pk_bho PRIMARY KEY (coucode, bhodate) ) This table lists days Mon-Fri that are not business days in various countries. If you want to know in which countries Christmas Eve is not a business day, you would say SELECT coucode FROM bankholidays WHERE bhodate = '20041224' But if you want to compute the settledate for a given trade date and instrument, the country code is the top-level key. (And since we mainly use the table for the latter purpose, this is why coucode comes first.) Thus, for such a table, it is very common to have a index on the reverse order of the key, or some reverse order for a key with three or more columns. What you should make special consideration, is which index should be the clustered index. This is necessarily not the primary key. It could be a different combination of the PK - or on a completely different column. Finally, one thing also to keep in mind is how you use the columns. If you say things like: SELECT * FROM tbl WHERE a = 1 AND b BETWEEN 12 AND 19 An index on (a,b) is better, even if a only has 10 possible values and b has 100. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techin.../2000/books.asp
Post Follow-up to this messageErland, You're right to say that Ledger is the "main" key. I should have explained but it looks like you figured it out yourself. I also incorrectly stated the current column order in the PK. It is Ledger, OrderLineNumber, OrderNumber and NOT Ledger, OrderNumber, OrderLineNumber. If I use your hierarchy theory the best order is Ledger, OrderNumber, OrderLineNumber. This also fits in nicely with Joe's statement that "an index (a, b, c) in effect gives you indexing for (a, b) and (a)", because a common SELECT or JOIN on this table would involve Ledger and OrderNumber as this points to the PK in the Orders table. I am running SQL Profiler against a customer's database today so I hope to have more information about the queries that our users run. This will enable me to make a more informed decision about the sequence of the columns in each PK in the database. It seems to me that it is more important to understand usage of the database rather than ordering them by number of distinct values. Ross
Post Follow-up to this message(gilles27@talk21.com) writes: > You're right to say that Ledger is the "main" key. I should have > explained but it looks like you figured it out yourself. I also > incorrectly stated the current column order in the PK. It is Ledger, > OrderLineNumber, OrderNumber and NOT Ledger, OrderNumber, > OrderLineNumber. Assuming that OrderLineNumber refers to a detail line in an order, having OrderLineNumber before OrderNumber sounds like someone had a brainmelt. There is all reason to change that key. (Unless queries like "Show me all line 5 on all orders in one ledger" are common.) -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techin.../2000/books.asp
Post Follow-up to this messageOur review of the primary keys was triggered by one of our customers complaining about the performance of the database. They claimed to have "re-sequenced the primary key which reduced Index reads by a factor of 5". Do you have any idea as to how they arrived at this conclusion?
Post Follow-up to this messageI created a large table (1306677 rows, 151888 KB) with the characteristics as originally described for Ledger, OrderNumber and OrderLineNumber. I then tried different primary key column orders (all 6 combinations). After reindexing, each combination resulted in the same number of pages and same index depth. In other words: there will be no performance difference for exact primary key matches. Maybe the table was very fragmented and had much unused space. Changing the primary key column order would effectively reindex the table, and thus remove fragmentation and restore the original fill factor. This could easily decrease the index depth and result in the performance gain. Gert-Jan gilles27@talk21.com wrote: > > Our review of the primary keys was triggered by one of our customers > complaining about the performance of the database. They claimed to have > "re-sequenced the primary key which reduced Index reads by a factor of > 5". Do you have any idea as to how they arrived at this conclusion?
Post Follow-up to this message(gilles27@talk21.com) writes: > Our review of the primary keys was triggered by one of our customers > complaining about the performance of the database. They claimed to have > "re-sequenced the primary key which reduced Index reads by a factor of > 5". Do you have any idea as to how they arrived at this conclusion? Sounds like hogwash to me. But if you are an evil man, please ask them for a repro that demonstrates this. (For the sake of customer relations, you might prefer to avoid it though. :-) Gert-Jan's theory about fragmentation being fixed by reindexing sounds plausible to me. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techin.../2000/books.asp
Post Follow-up to this messageErland Sommarskog wrote: > (gilles27@talk21.com) writes: > > Sounds like hogwash to me. I don't know the table structure and queries but there actually are scenarios with different queries against the same table where the overall benefit of reordering columns is significant. > But if you are an evil man, please ask them > for a repro that demonstrates this. (For the sake of customer > relations, you might prefer to avoid it though. :-) That depends. From my experience customers actually honor showing interest in their findings. Remember that they experienced a success because they were able to implement such a dramatic improvement. In the end both sides benefit which in turn can improve customer satisfaction. > Gert-Jan's theory about fragmentation being fixed by reindexing sounds > plausible to me. Yes, that's another option. Kind regards robert
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread