|
Home > Archive > MS SQL Server > April 2005 > Please help settle this argument: Clustered index
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Please help settle this argument: Clustered index
|
|
|
| I am currently having a debate with a developer. Initially all the tables in
his database had only a primary key (PK), none of which were clustered, so
all the tables were heaps.
We have already seen some great improvements after I recommended additional
indexes, however I am having a hard time convincing the developer that it is
best for most tables in the database to have a clustered index - especially
the tables with large numbers of records.
His viewpoint is that this is not necesary because Clustered index are used
when queries return a range of values ( or sorted, grouped, ... ), which he
says is not the case for these tables.
Who is right?
| |
| Michael C# 2005-04-21, 1:23 pm |
| I like to almost always have a clustered index on a table. At the very
least, proper clustered indexes can help eliminate Sort Operations and
Bookmark Lookups you'll often see in Query Execution plans when using
non-clustered indexes.
"DBA72" <DBA72@discussions.microsoft.com> wrote in message
news:71BEC856-DEB6-46A0-B107- C57833D6101A@microso
ft.com...
>I am currently having a debate with a developer. Initially all the tables
>in
> his database had only a primary key (PK), none of which were clustered, so
> all the tables were heaps.
>
> We have already seen some great improvements after I recommended
> additional
> indexes, however I am having a hard time convincing the developer that it
> is
> best for most tables in the database to have a clustered index -
> especially
> the tables with large numbers of records.
>
> His viewpoint is that this is not necesary because Clustered index are
> used
> when queries return a range of values ( or sorted, grouped, ... ), which
> he
> says is not the case for these tables.
>
> Who is right?
| |
| Alejandro Mesa 2005-04-21, 1:23 pm |
| See if this helps:
Cluster That Index!
http://www.quest-pipelines.com/newsletter-v4/1103_B.htm
AMB
"DBA72" wrote:
> I am currently having a debate with a developer. Initially all the tables in
> his database had only a primary key (PK), none of which were clustered, so
> all the tables were heaps.
>
> We have already seen some great improvements after I recommended additional
> indexes, however I am having a hard time convincing the developer that it is
> best for most tables in the database to have a clustered index - especially
> the tables with large numbers of records.
>
> His viewpoint is that this is not necesary because Clustered index are used
> when queries return a range of values ( or sorted, grouped, ... ), which he
> says is not the case for these tables.
>
> Who is right?
| |
| sql question 2005-04-21, 1:23 pm |
| Clustered indrex is good in the situation, as you stated, where the query
runs agains a range of values of the clustered-index component columns.
Also when you sort by the cluster key, it speeds up the process.
Where it is not good is that you just put on a cluster on the key, for no
particular reason. This way you waste the capability of clustering with a
purpose. When you later on need a cluster, you will have to rebuild your
index.
Where it is bad is that you do not have any sequential query and you have a
lot of non-sequential inserts. It will generate a lot of page splits (or a
lot wasted space + unnecessary disk I/Os), whereas a heap will just append
the new inserts at the bottom of the page/extent.
hth
"DBA72" <DBA72@discussions.microsoft.com> wrote in message
news:71BEC856-DEB6-46A0-B107- C57833D6101A@microso
ft.com...
>I am currently having a debate with a developer. Initially all the tables
>in
> his database had only a primary key (PK), none of which were clustered, so
> all the tables were heaps.
>
> We have already seen some great improvements after I recommended
> additional
> indexes, however I am having a hard time convincing the developer that it
> is
> best for most tables in the database to have a clustered index -
> especially
> the tables with large numbers of records.
>
> His viewpoint is that this is not necesary because Clustered index are
> used
> when queries return a range of values ( or sorted, grouped, ... ), which
> he
> says is not the case for these tables.
>
> Who is right?
| |
| Jacco Schalkwijk 2005-04-21, 1:23 pm |
| Ask him what the _dis_advantage is of having a clustered index, since he
seems to be against them (I bet he can't come up with anything meaningful).
Anything a non-clustered index can do, a clustered index can do at least as
well, so if you only have one index on a table, make it clustered. On top of
that, having a clustered index on a table will actually make your table
perform better for updates, inserts and deletes. This is because the
non-clustered index uses the clustered index to find the row, where when
there is no clustered index the non-clustered indexes will use the actual
physical row location. That means that if a row moves, for example because
the page it is on gets split because it is too full, the non-clustered
indexes that point to that row have to be updated when the table is a heap.
But if the table has a clustered index, the key for the clustered index
stays the same, so the non-clustered indexes don't have to be updated.
Also, a row pointer in a non-clustered index on a heap takes up 10 bytes, a
key for a clustered index can take up as little as 5 bytes.
You can of course have only one clustered index on a table, that why you
have to think about which index to make clustered.
--
Jacco Schalkwijk
SQL Server MVP
"DBA72" <DBA72@discussions.microsoft.com> wrote in message
news:71BEC856-DEB6-46A0-B107- C57833D6101A@microso
ft.com...
>I am currently having a debate with a developer. Initially all the tables
>in
> his database had only a primary key (PK), none of which were clustered, so
> all the tables were heaps.
>
> We have already seen some great improvements after I recommended
> additional
> indexes, however I am having a hard time convincing the developer that it
> is
> best for most tables in the database to have a clustered index -
> especially
> the tables with large numbers of records.
>
> His viewpoint is that this is not necesary because Clustered index are
> used
> when queries return a range of values ( or sorted, grouped, ... ), which
> he
> says is not the case for these tables.
>
> Who is right?
| |
| Michael C# 2005-04-21, 1:23 pm |
|
"sql question" <abc@xyz.com> wrote in message
news:O8s%23UYpRFHA.3628@TK2MSFTNGP12.phx.gbl...
> Clustered indrex is good in the situation, as you stated, where the query
> runs agains a range of values of the clustered-index component columns.
> Also when you sort by the cluster key, it speeds up the process.
It can also eliminate sort operations internal to the query execution plan
(these don't necessarily guarantee the sorted order of returned results
though).
> Where it is not good is that you just put on a cluster on the key, for no
> particular reason. This way you waste the capability of clustering with a
> purpose. When you later on need a cluster, you will have to rebuild your
> index.
There's rarely a reason to not use a clustered index. You are right though,
careful consideration should be made to ensure that you include appropriate
columns in your clustered index so you won't need a "do-over" later.
> Where it is bad is that you do not have any sequential query and you have
> a lot of non-sequential inserts. It will generate a lot of page splits
> (or a lot wasted space + unnecessary disk I/Os), whereas a heap will just
> append the new inserts at the bottom of the page/extent.
This is, of course, versus the additional cost of all those extra bookmark
lookups (you thought page splits generated a lot of unecessary disk I/Os!)
and internal sorts (lots of extra processing) due to not using a clustered
index. You can limit the impact of page splitting by modifying fill factor
and padding when you create the clustered index and by running DBCC
REINDEX - especially after adding lots of rows to your table.
| |
| sql question 2005-04-21, 8:23 pm |
| >> Where it is bad is that you do not have any sequential query and you have
>
> This is, of course, versus the additional cost of all those extra bookmark
> lookups (you thought page splits generated a lot of unecessary disk I/Os!)
> and internal sorts (lots of extra processing) due to not using a clustered
> index. You can limit the impact of page splitting by modifying fill
> factor and padding when you create the clustered index and by running DBCC
> REINDEX - especially after adding lots of rows to your table.
"you thought page splits generated a lot of unecessary disk I/Os!"
No it's not that. Page split is one thing and (wasted space + unnecessary
disk I/Os) is the other. Either you generate page split, or, if you have
had the page splits, then you are looking at a low density population of the
pages, which results in wasted space and unnecessary disk I/Os.
| |
| sql question 2005-04-21, 8:23 pm |
| > index. You can limit the impact of page splitting by modifying fill
> factor and padding when you create the clustered index and by running DBCC
> REINDEX - especially after adding lots of rows to your table.
And keep in mind, that the larger the fill factor is, the more likely you
are going to have page splits and the smaller the fill factor is, the more
empty space you are going to occupy. Either way you are not immune to the
split or wasted space + added disk I/O problem.
Also remember, the DBCC REINDEX is an offline process. You do not have your
table available while running this process. Between your running this
process, you remain subjectd to the wasted space + added disk I/O problem.
| |
| Hugo Kornelis 2005-04-21, 8:23 pm |
| On Thu, 21 Apr 2005 18:20:17 +0100, Jacco Schalkwijk wrote:
(snip)
>That means that if a row moves, for example because
>the page it is on gets split because it is too full, the non-clustered
>indexes that point to that row have to be updated when the table is a heap.
Hi Jacco,
I don't think the above is correct. First, there are no page splits in a
heap. Individual rows can move (when varying length columns are updated
with longer values), but in that case, a forwarding pointer is inserted
at the original location, so that nonclustered indexes don't need to be
changed. Of course, the downside is an extra page read: after finding
the page, a forwarding pointer instead of a row is encountered and the
page for the new location needs to be read.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)
| |
| Hugo Kornelis 2005-04-21, 8:23 pm |
| On Thu, 21 Apr 2005 09:47:12 -0700, DBA72 wrote:
>I am currently having a debate with a developer. Initially all the tables in
>his database had only a primary key (PK), none of which were clustered, so
>all the tables were heaps.
>
>We have already seen some great improvements after I recommended additional
>indexes, however I am having a hard time convincing the developer that it is
>best for most tables in the database to have a clustered index - especially
>the tables with large numbers of records.
>
>His viewpoint is that this is not necesary because Clustered index are used
>when queries return a range of values ( or sorted, grouped, ... ), which he
>says is not the case for these tables.
>
>Who is right?
Hi DBA72,
Best way to find out is to test it. Load the current tables on a test
server, fill them with enough data for meaningful speed comparisons and
check how long some typical actions take. Change one of the indexes for
each table to clustered, then repeat the same tests. Don't forget to
clear out the data cache before each test. Clear out the procedure cache
as well, to force recompiles that will take the new indexes as well
(maybe not necessary, but it won't hurt).
CHECKPOINT
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)
| |
| Jacco Schalkwijk 2005-04-22, 7:23 am |
| You're right. What I was saying is why a non-clustered index on a clustered
index uses the clustered index key instead of the rowpointer. That's a
related subject, but not exactly what we were talking about here.
There are more downsides to forwarding pointers btw:
- they take up space, and you can't use DBCC DBREINDEX or so to
"consolidate" the row pointers.
- forwarding pointers can point to forwarding pointers can point to
forwarding pointers.....
- in large tables there is a large chance that the row to which the
forwarding pointer points is not in the same extent, and so may not already
be in memory, meaning more physical I/O
--
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message
news:i57g61p86sa5pj4
41enk97jd6injfpmc84@
4ax.com...
> On Thu, 21 Apr 2005 18:20:17 +0100, Jacco Schalkwijk wrote:
>
> (snip)
>
> Hi Jacco,
>
> I don't think the above is correct. First, there are no page splits in a
> heap. Individual rows can move (when varying length columns are updated
> with longer values), but in that case, a forwarding pointer is inserted
> at the original location, so that nonclustered indexes don't need to be
> changed. Of course, the downside is an extra page read: after finding
> the page, a forwarding pointer instead of a row is encountered and the
> page for the new location needs to be read.
>
> Best, Hugo
> --
>
> (Remove _NO_ and _SPAM_ to get my e-mail address)
| |
| Michael C# 2005-04-22, 11:23 am |
| "sql question" <abc@xyz.com> wrote in message
news:esnRhyqRFHA.356@TK2MSFTNGP14.phx.gbl...
> "you thought page splits generated a lot of unecessary disk I/Os!"
>
> No it's not that. Page split is one thing and (wasted space + unnecessary
> disk I/Os) is the other. Either you generate page split, or, if you have
> had the page splits, then you are looking at a low density population of
> the pages, which results in wasted space and unnecessary disk I/Os.
>
Let's not forget the pointers to data and the pointers to pointers to data
which are non-clustered indexes. How does reading all those extra pages
affect your I/O performance? Ignore clustered indexes, and blame it on page
splits all you want, if that's what you feel...
| |
| Hugo Kornelis 2005-04-22, 8:23 pm |
| On Fri, 22 Apr 2005 11:39:22 +0100, Jacco Schalkwijk wrote:
(snip)
>- forwarding pointers can point to forwarding pointers can point to
>forwarding pointers.....
Hi Jacco,
I agree with most of your post, but not with this statement. A
forwarding pointer can only point to the row, not to a next forwarding
pointer. If the data has to grow a second time, it is moved to a
different location (again), but no new forwarding pointer is used;
instead, the first forwarding pointer is changed to point to the new
location.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)
| |
| Jacco Schalkwijk 2005-04-23, 7:23 am |
| Seems like I really should read up on this subject, but then again, all
tables I design do have clustered indexes :)
--
Jacco Schalkwijk
SQL Server MVP
"Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message
news:3r0j619tcnrc9hc
11ve89459iae7ttnd05@
4ax.com...
> On Fri, 22 Apr 2005 11:39:22 +0100, Jacco Schalkwijk wrote:
>
> (snip)
>
> Hi Jacco,
>
> I agree with most of your post, but not with this statement. A
> forwarding pointer can only point to the row, not to a next forwarding
> pointer. If the data has to grow a second time, it is moved to a
> different location (again), but no new forwarding pointer is used;
> instead, the first forwarding pointer is changed to point to the new
> location.
>
> Best, Hugo
> --
>
> (Remove _NO_ and _SPAM_ to get my e-mail address)
| |
| Michael C# 2005-04-28, 8:23 pm |
| When you perform Hugo's tests, keep in mind that the amount of data you
store can affect the results quite a bit. Bookmark lookups and sorts for a
table of 10 rows is going to be pretty insignificant; so is page splitting.
When you have multiple tables with 1,000,000 rows they become much more
significant factors.
"Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message
news:gc7g615a3e8d9j5
gp4hch98kgvnhk976p8@
4ax.com...
> On Thu, 21 Apr 2005 09:47:12 -0700, DBA72 wrote:
>
>
> Hi DBA72,
>
> Best way to find out is to test it. Load the current tables on a test
> server, fill them with enough data for meaningful speed comparisons and
> check how long some typical actions take. Change one of the indexes for
> each table to clustered, then repeat the same tests. Don't forget to
> clear out the data cache before each test. Clear out the procedure cache
> as well, to force recompiles that will take the new indexes as well
> (maybe not necessary, but it won't hurt).
>
> CHECKPOINT
> DBCC DROPCLEANBUFFERS
> DBCC FREEPROCCACHE
>
> Best, Hugo
> --
>
> (Remove _NO_ and _SPAM_ to get my e-mail address)
|
|
|
|
|