Home > Archive > MS SQL Server > December 2006 > who understands SQL Server? --- agghhhhhhh......









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author who understands SQL Server? --- agghhhhhhh......
deja@2bytes.co.uk

2006-12-14, 5:25 am

I've got a table which is in 3 separate databases on the same server,
each with different indexes. The rest of the database is the same (2
were restored from a backup of the first - i could have done 3 tables
in the same database but it just seemed easier at the time). All
indexes on these 3 tables have been dropped and recreated.

The tables are like this
fromInt int not null,
lastInt int not null,
id char(32) not null,
name varchar(255) null,
keyname varchar(255) null,
description varchar(255) null,
groupid char(32)

NO PRIMARY KEY defined.

In db1 I have a non-clustered unique index on lastInt, id (lastInt is
pretty specific, the most for a specific lastchanged is 147 - out of a
total of 160,000 records - but over 120,000 have less than 30).

In db2 I have a CLUSTERED unique index on lastInt,Id AND a
non-clustered index on Id,lastInt

In db3 I have a non-clustered unique index on lastInt,Id AND a
non-clustered index on Id,lastInt

Common sense would tell me that db2, using a clustered index would
generally perform better for reads. db3 should perform better than db1
for reads because the optimiser has 2 choices of index?

I am using maxdop(1), DBCC DROPCLEANBUFFERS, DBCC FREEPROCCACHE and
CHECKPOINT before any query to try to ensure a clean query.

1)
SELECT * FROM mytable t1 with (nolock)
WHERE lastInt = ( select max(lastInt) from myTable t2 with (nolock)
where t2.id = t1.id AND lastInt > 22881 AND lastInt <= 22900)
option (maxdop 1)

returns just 137 rows out of 160,000. All dbs perform quickly though
execution plan for db3 shows greatest cost by far: 6%, 5%, 89%!! Why?

db1:
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:(myTable AS [t1])
WITH PREFETCH)
|--Nested Loops(Inner Join, OUTER REFERENCES:([t2].[Id],
[Expr1002]) WITH PREFETCH)
|--Stream Aggregate(GROUP BY:([t2].[Id])
DEFINE:([Expr1002]=MAX([t2].[lastInt])))
| |--Sort(ORDER BY:([t2].[Id] ASC))
| |--Index Seek(OBJECT:(myTable
.[myIndex] AS [t2]),
SEEK:([t2].[lastInt] > 22881 AND [t2].[lastInt] <= 22900) ORDERED
FORWARD)
|--Index Seek(OBJECT:(myTable
.[myIndex] AS [t1]),
SEEK:([t1].[lastInt]=[Expr1002] AND [t1].[Id]=[t2].[Id]) ORDERED
FORWARD)

as opposed to:

db3:
|--Top(1)
|--Segment
|--Sort(ORDER BY:([t1].[Id] DESC, [t1].[lastInt] DESC))
|--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:(myTable AS [t1]) WITH PREFETCH)
|--Index Seek(OBJECT:([myTable].[myIndex] AS
[t1]), SEEK:([t1].[lastInt] > 22881 AND [t1].[lastInt] <= 22900)
ORDERED FORWARD)

But the performance for this query is so quick it is more just a matter
of interest. However then I try the same query but with 15000 as the
lower figure instead of 22881 (returns 14722 rows)

db1:
|--Hash Match(Inner Join, HASH:([Expr1002],
[t2].[Id])=([t1].[lastInt], [t1].[Id]),
RESIDUAL:([Expr1002]=[t1].[lastInt] AND [t1].[Id]=[t2].[Id]))
|--Hash Match(Aggregate, HASH:([t2].[Id]),
RESIDUAL:([t2].[Id]=[t2].[Id]) DEFINE:([Expr1002]=MAX([t2].[lastInt])))
| |--Index Seek(OBJECT:([myTable].[myIndex] AS [t2]),
SEEK:([t2].[lastInt] > 15000 AND [t2].[lastInt] <= 22900) ORDERED
FORWARD)
|--Table Scan(OBJECT:([myTable] AS [t1]))

db2:
|--Top(1)
|--Segment
|--Sort(ORDER BY:([t1].[Id] DESC, [t1].[lastInt] DESC))
|--Clustered Index Seek(OBJECT:([myTable].[myIndex] AS
[t1]), SEEK:([t1].[lastInt] > 15000 AND [t1].[lastInt] <= 22900)
ORDERED FORWARD)

db3:
|--Hash Match(Inner Join, HASH:([Expr1002],
[t2].[Id])=([t1].[lastInt], [t1].[Id]),
RESIDUAL:([Expr1002]=[t1].[lastInt] AND [t1].[Id]=[t2].[Id]))
|--Stream Aggregate(GROUP BY:([t2].[Id])
DEFINE:([Expr1002]=MAX([t2].[lastInt])))
| |--Index Scan(OBJECT:([myTable].[myIndex2] AS [t2]),
WHERE:([t2].[lastInt]>15000 AND [t2].[lastInt]<=22900) ORDERED FORWARD)
|--Table Scan(OBJECT:([myTable] AS [t1]))

This shows 41% for db1, 20% for db2 and 39% for db3 and yet in reality
the time taken is 16 secs, 20 secs and 5 secs. Consistently. So,
although there are other factors in play, I have done the test a number
of times and the results end up pretty much the same (in fact before I
dropped and recreated the indexes db2 with the clustered index was
taking 40 seconds but the others were the same - eh?)

I don't understand this. First of all, why, in the first query, does
db3 have such a high relative query cost. And in the second query, why
is db3 by far the best performer?


Finally I try this query:

SELECT * FROM myTable t1 with (nolock)
WHERE lastInt = ( select max(lastInt) from myTable t2 with (nolock)
where t2.id = t1.id AND lastInt <= 22900) and fromInt = -2147483646
option (maxdop 1)

Again the results come back in 18, 9 and 5 seconds respectively with
query costs: 38%, 31% and 31% (in this case db1 timing seems quite
variable whereas db2 and db3 are pretty consistent no matter what order
I do them - in fact before I dropped and recreated the indexes db2 with
the clustered index was taking 18 seconds but the others were the same
: see above, this is coincidentally (probably not) double the time
taken after the index was dropped and recreated, same as query above -
eh?).

I'm struggling to understand the inner workings here. I assumed that
because the query cost was generally the lowest, and a clustered index
is usually a good thing, that the results would favour the clustered
index but consistently they don't. My estimated row width is 474
(estimated clustered index width = 500). Interestingly myIndex in db1
has the same estimated size as myIndex2 in db3 (i.e 51) yet myIndex2 in
db2 has estimated size 77 - why??? myIndex2 in db2 is the same as
myIndex2 in db3 on exactly the same data.

I have also created statistics on (lastInt, id) and updated existing
statistics on (id) and (fromInt) separately in all 3 dbs.

I know joins might be better than sub-selects in some cases, but a) I
want to understand the basic flaws in my thinking about SQL server and
b) because the field list is built dynamically within the real-life app
it would be more tricky to prefix all fields with t1. to ensure I got
the expected results.

Sorry for long post but I thought it was better to have all facts.
Thanks for any help

Phil

Uri Dimant

2006-12-14, 5:25 am

deja
> In db2 I have a CLUSTERED unique index on lastInt,Id AND a
> non-clustered index on Id,lastInt
> In db3 I have a non-clustered unique index on lastInt,Id AND a
> non-clustered index on Id,lastInt


Having a duplicate indexes is a bad idea


> SELECT * FROM mytable t1 with (nolock)
> WHERE lastInt = ( select max(lastInt) from myTable t2 with (nolock)
> where t2.id = t1.id AND lastInt > 22881 AND lastInt <= 22900)
> option (maxdop 1)


Don't use SELECT * in the production enviroment , it leads sql server to
perform bookmark to retrieve the data which my hurt perfomance
If you use a range seraching I'd suggest you creating a clustered index on
lastInt column



<deja@2bytes.co.uk> wrote in message
news:1166091799.821002.27650@16g2000cwy.googlegroups.com...
> I've got a table which is in 3 separate databases on the same server,
> each with different indexes. The rest of the database is the same (2
> were restored from a backup of the first - i could have done 3 tables
> in the same database but it just seemed easier at the time). All
> indexes on these 3 tables have been dropped and recreated.
>
> The tables are like this
> fromInt int not null,
> lastInt int not null,
> id char(32) not null,
> name varchar(255) null,
> keyname varchar(255) null,
> description varchar(255) null,
> groupid char(32)
>
> NO PRIMARY KEY defined.
>
> In db1 I have a non-clustered unique index on lastInt, id (lastInt is
> pretty specific, the most for a specific lastchanged is 147 - out of a
> total of 160,000 records - but over 120,000 have less than 30).
>
> In db2 I have a CLUSTERED unique index on lastInt,Id AND a
> non-clustered index on Id,lastInt
>
> In db3 I have a non-clustered unique index on lastInt,Id AND a
> non-clustered index on Id,lastInt
>
> Common sense would tell me that db2, using a clustered index would
> generally perform better for reads. db3 should perform better than db1
> for reads because the optimiser has 2 choices of index?
>
> I am using maxdop(1), DBCC DROPCLEANBUFFERS, DBCC FREEPROCCACHE and
> CHECKPOINT before any query to try to ensure a clean query.
>
> 1)
> SELECT * FROM mytable t1 with (nolock)
> WHERE lastInt = ( select max(lastInt) from myTable t2 with (nolock)
> where t2.id = t1.id AND lastInt > 22881 AND lastInt <= 22900)
> option (maxdop 1)
>
> returns just 137 rows out of 160,000. All dbs perform quickly though
> execution plan for db3 shows greatest cost by far: 6%, 5%, 89%!! Why?
>
> db1:
> |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:(myTable AS [t1])
> WITH PREFETCH)
> |--Nested Loops(Inner Join, OUTER REFERENCES:([t2].[Id],
> [Expr1002]) WITH PREFETCH)
> |--Stream Aggregate(GROUP BY:([t2].[Id])
> DEFINE:([Expr1002]=MAX([t2].[lastInt])))
> | |--Sort(ORDER BY:([t2].[Id] ASC))
> | |--Index Seek(OBJECT:(myTable
.[myIndex] AS [t2]),
> SEEK:([t2].[lastInt] > 22881 AND [t2].[lastInt] <= 22900) ORDERED
> FORWARD)
> |--Index Seek(OBJECT:(myTable
.[myIndex] AS [t1]),
> SEEK:([t1].[lastInt]=[Expr1002] AND [t1].[Id]=[t2].[Id]) ORDERED
> FORWARD)
>
> as opposed to:
>
> db3:
> |--Top(1)
> |--Segment
> |--Sort(ORDER BY:([t1].[Id] DESC, [t1].[lastInt] DESC))
> |--Bookmark Lookup(BOOKMARK:([Bmk1000]),
> OBJECT:(myTable AS [t1]) WITH PREFETCH)
> |--Index Seek(OBJECT:([myTable].[myIndex] AS
> [t1]), SEEK:([t1].[lastInt] > 22881 AND [t1].[lastInt] <= 22900)
> ORDERED FORWARD)
>
> But the performance for this query is so quick it is more just a matter
> of interest. However then I try the same query but with 15000 as the
> lower figure instead of 22881 (returns 14722 rows)
>
> db1:
> |--Hash Match(Inner Join, HASH:([Expr1002],
> [t2].[Id])=([t1].[lastInt], [t1].[Id]),
> RESIDUAL:([Expr1002]=[t1].[lastInt] AND [t1].[Id]=[t2].[Id]))
> |--Hash Match(Aggregate, HASH:([t2].[Id]),
> RESIDUAL:([t2].[Id]=[t2].[Id]) DEFINE:([Expr1002]=MAX([t2].[lastInt])))
> | |--Index Seek(OBJECT:([myTable].[myIndex] AS [t2]),
> SEEK:([t2].[lastInt] > 15000 AND [t2].[lastInt] <= 22900) ORDERED
> FORWARD)
> |--Table Scan(OBJECT:([myTable] AS [t1]))
>
> db2:
> |--Top(1)
> |--Segment
> |--Sort(ORDER BY:([t1].[Id] DESC, [t1].[lastInt] DESC))
> |--Clustered Index Seek(OBJECT:([myTable].[myIndex] AS
> [t1]), SEEK:([t1].[lastInt] > 15000 AND [t1].[lastInt] <= 22900)
> ORDERED FORWARD)
>
> db3:
> |--Hash Match(Inner Join, HASH:([Expr1002],
> [t2].[Id])=([t1].[lastInt], [t1].[Id]),
> RESIDUAL:([Expr1002]=[t1].[lastInt] AND [t1].[Id]=[t2].[Id]))
> |--Stream Aggregate(GROUP BY:([t2].[Id])
> DEFINE:([Expr1002]=MAX([t2].[lastInt])))
> | |--Index Scan(OBJECT:([myTable].[myIndex2] AS [t2]),
> WHERE:([t2].[lastInt]>15000 AND [t2].[lastInt]<=22900) ORDERED FORWARD)
> |--Table Scan(OBJECT:([myTable] AS [t1]))
>
> This shows 41% for db1, 20% for db2 and 39% for db3 and yet in reality
> the time taken is 16 secs, 20 secs and 5 secs. Consistently. So,
> although there are other factors in play, I have done the test a number
> of times and the results end up pretty much the same (in fact before I
> dropped and recreated the indexes db2 with the clustered index was
> taking 40 seconds but the others were the same - eh?)
>
> I don't understand this. First of all, why, in the first query, does
> db3 have such a high relative query cost. And in the second query, why
> is db3 by far the best performer?
>
>
> Finally I try this query:
>
> SELECT * FROM myTable t1 with (nolock)
> WHERE lastInt = ( select max(lastInt) from myTable t2 with (nolock)
> where t2.id = t1.id AND lastInt <= 22900) and fromInt = -2147483646
> option (maxdop 1)
>
> Again the results come back in 18, 9 and 5 seconds respectively with
> query costs: 38%, 31% and 31% (in this case db1 timing seems quite
> variable whereas db2 and db3 are pretty consistent no matter what order
> I do them - in fact before I dropped and recreated the indexes db2 with
> the clustered index was taking 18 seconds but the others were the same
> : see above, this is coincidentally (probably not) double the time
> taken after the index was dropped and recreated, same as query above -
> eh?).
>
> I'm struggling to understand the inner workings here. I assumed that
> because the query cost was generally the lowest, and a clustered index
> is usually a good thing, that the results would favour the clustered
> index but consistently they don't. My estimated row width is 474
> (estimated clustered index width = 500). Interestingly myIndex in db1
> has the same estimated size as myIndex2 in db3 (i.e 51) yet myIndex2 in
> db2 has estimated size 77 - why??? myIndex2 in db2 is the same as
> myIndex2 in db3 on exactly the same data.
>
> I have also created statistics on (lastInt, id) and updated existing
> statistics on (id) and (fromInt) separately in all 3 dbs.
>
> I know joins might be better than sub-selects in some cases, but a) I
> want to understand the basic flaws in my thinking about SQL server and
> b) because the field list is built dynamically within the real-life app
> it would be more tricky to prefix all fields with t1. to ensure I got
> the expected results.
>
> Sorry for long post but I thought it was better to have all facts.
> Thanks for any help
>
> Phil
>



deja@2bytes.co.uk

2006-12-14, 5:25 am

> Having a duplicate indexes is a bad idea

well you say that, but in fact if you look at the figures it performs
much quicker.....


> Don't use SELECT * in the production enviroment , it leads sql server to
> perform bookmark to retrieve the data which my hurt perfomance
> If you use a range seraching I'd suggest you creating a clustered index on
> lastInt column


I have got a clustered index on lastInt column in db2 and it didn't
perform very well. Did you read the stats?
I also changed my *, to explicitly select all columns by name - made no
difference at all. BTW I am on SQL 2000 Service Pack 4.



Uri Dimant wrote:
[color=darkred]
> deja
>
> Having a duplicate indexes is a bad idea
>
>
>
> Don't use SELECT * in the production enviroment , it leads sql server to
> perform bookmark to retrieve the data which my hurt perfomance
> If you use a range seraching I'd suggest you creating a clustered index on
> lastInt column
>
>
>
> <deja@2bytes.co.uk> wrote in message
> news:1166091799.821002.27650@16g2000cwy.googlegroups.com...

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com