Home > Archive > Microsoft SQL Server forum > July 2005 > Very slow distinct select









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Very slow distinct select
Rich

2005-07-25, 8:29 pm

My table looks like this:
char(150) HTTP_REF,
char(250) HTTP_USER,
char(150) REMOTE_ADDR,
char(150) REMOTE_HOST,
char(150) URL,
smalldatetime TIME_STAMP

There are no indexes on this table and there are only 293,658 records total.

When I do a select like this it takes forever:

SELECT COUNT(DISTINCT REMOTE_ADDR)

Takes 2 minutes. Is there anyway to speed that up?

Thanks


Dan Guzman

2005-07-25, 8:29 pm

You can speed up this query with a non-clustered index on REMOTE_ADDR.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Rich" <no@spam.invalid> wrote in message
news:DbfFe.48792$4o.35562@fed1read06...
> My table looks like this:
> char(150) HTTP_REF,
> char(250) HTTP_USER,
> char(150) REMOTE_ADDR,
> char(150) REMOTE_HOST,
> char(150) URL,
> smalldatetime TIME_STAMP
>
> There are no indexes on this table and there are only 293,658 records
> total.
>
> When I do a select like this it takes forever:
>
> SELECT COUNT(DISTINCT REMOTE_ADDR)
>
> Takes 2 minutes. Is there anyway to speed that up?
>
> Thanks
>
>



Rich

2005-07-25, 8:29 pm

REMOTE_ADDR has duplicate values. So I added a numeric identity column and
did this:

CREATE UNIQUE NONCLUSTERED INDEX Requests_RemoteAddr ON Requests
(REMOTE_ADDR, ID)

But that didn't seem to help the speed any.

Any other ideas?

Thanks.


"Dan Guzman" <guzmanda@nospam-online.sbcglobal.net> wrote in message
news:09gFe.2034$PC7.1092@newssvr29.news.prodigy.net...
> You can speed up this query with a non-clustered index on REMOTE_ADDR.
>
> --
> Hope this helps.
>
> Dan Guzman
> SQL Server MVP
>
> "Rich" <no@spam.invalid> wrote in message
> news:DbfFe.48792$4o.35562@fed1read06...
>
>



--CELKO--

2005-07-25, 8:29 pm

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

You did not so much as tell us the **name* of this table! There are
not indexes?

Does your boss -- who is paying you, unlike us who are doing your work
for you for free -- give sooooooo little to work with?

Rich

2005-07-26, 7:23 am

What the heck is DDL???
Didn't I give enough info in my two posts?
And what does the name matter?


"--CELKO--" <jcelko212@earthlink.net> wrote in message
news:1122342953.732318.259740@g44g2000cwa.googlegroups.com...
> Please post DDL, so that people do not have to guess what the keys,
> constraints, Declarative Referential Integrity, data types, etc. in
> your schema are. Sample data is also a good idea, along with clear
> specifications. It is very hard to debug code when you do not let us
> see it.
>
> You did not so much as tell us the **name* of this table! There are
> not indexes?
>
> Does your boss -- who is paying you, unlike us who are doing your work
> for you for free -- give sooooooo little to work with?
>



Dan Guzman

2005-07-26, 7:23 am

> REMOTE_ADDR has duplicate values. So I added a numeric identity column

A non-unique index will optimize your COUNT(DISTINCT REMOTE_ADDR). No need
to add a unique-ifier identity column.

How many distinct values? If it is fairly high, a scan of many index leaf
nodes is needed and this might not perform much better than a table scan.
In that case, you can create a view with the aggregation and then create an
index on that view. For Example

CREATE VIEW Requests_RemoteAddr
WITH SCHEMABINDING AS
SELECT REMOTE_ADDR, COUNT_BIG(*) AS CoungBig
FROM dbo.Requests
GROUP BY REMOTE_ADDR
GO

CREATE UNIQUE CLUSTERED INDEX Requests_RemoteAddr_
Index ON
WebLog(REMOTE_ADDR)
GO

Your query can automatically use the view index when accessing the base
table if you are using SQL 2000 Enterprise Edition. If you are using
another edition, you'll need to access the view and add the NOEXPAND hint:

SELECT COUNT(DISTINCT REMOTE_ADDR)
FROM dbo.Results_View WITH (NOEXPAND)

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Rich" <no@spam.invalid> wrote in message
news:IBgFe.48800$4o.29531@fed1read06...
> REMOTE_ADDR has duplicate values. So I added a numeric identity column and
> did this:
>
> CREATE UNIQUE NONCLUSTERED INDEX Requests_RemoteAddr ON Requests
> (REMOTE_ADDR, ID)
>
> But that didn't seem to help the speed any.
>
> Any other ideas?
>
> Thanks.
>
>
> "Dan Guzman" <guzmanda@nospam-online.sbcglobal.net> wrote in message
> news:09gFe.2034$PC7.1092@newssvr29.news.prodigy.net...
>
>



Erland Sommarskog

2005-07-26, 7:23 am

Rich (no@spam.invalid) writes:
> What the heck is DDL???
> Didn't I give enough info in my two posts?
> And what does the name matter?


DDL = Data Definition Laguage. That is CREATE TABLE etc. Which it appears
that you did include.

Too many SQL savvies just throw this funny TLAs(*) around them, and
don't give a damn if people they pretend to help understand them or not.


(*) TLA = Three-letter abbriviation.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp
Erland Sommarskog

2005-07-26, 7:23 am

Rich (no@spam.invalid) writes:
> My table looks like this:
> char(150) HTTP_REF,
> char(250) HTTP_USER,
> char(150) REMOTE_ADDR,
> char(150) REMOTE_HOST,
> char(150) URL,
> smalldatetime TIME_STAMP
>
> There are no indexes on this table and there are only 293,658 records
> total.
>
> When I do a select like this it takes forever:
>
> SELECT COUNT(DISTINCT REMOTE_ADDR)
>
> Takes 2 minutes. Is there anyway to speed that up?


Assuming that these columns are not really of fixed length, there is
quite something to win by changing char to varchar. char is fixed
length. Your row size is 854 bytes, which means that you can get at
most 9 rows per page. Thus your table requires 32628 pages @ 8192
bytes for a total of 267 MB. Which is a small table.

Say that the average length of all columns is 40 bytes. Then your average
row size is 5*(40+2) + 4 =214, giving 37 rows per page. This reduces the
size of the table to 65 MB.

The smaller the pages, the fewer pages to read, and thus the shorter the
response time.

Adding a non-clustered index as Dan suggested is of course even better
because then the table you traverse is only the index. And if you
use varchar instead that index is even smaller.

Note that I and Dan assume that your query really is

SELECT COUNT(DISTINCT REMOTE_ADDR) FROM tbl

and nothing else. If there is a WHERE clause involved that refers to
other columns, then the index is not that useful.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp
Rich

2005-07-26, 8:24 pm

Thanks Dan and Erland... it has gone from 2 minutes all the way down to 1
second!
But in the process, I accidentally truncated a lot of URLs to 50. OH WELL.
At least its faster :) Thanks.


"Rich" <no@spam.invalid> wrote in message
news:DbfFe.48792$4o.35562@fed1read06...
> My table looks like this:
> char(150) HTTP_REF,
> char(250) HTTP_USER,
> char(150) REMOTE_ADDR,
> char(150) REMOTE_HOST,
> char(150) URL,
> smalldatetime TIME_STAMP
>
> There are no indexes on this table and there are only 293,658 records

total.
>
> When I do a select like this it takes forever:
>
> SELECT COUNT(DISTINCT REMOTE_ADDR)
>
> Takes 2 minutes. Is there anyway to speed that up?
>
> Thanks
>
>



Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com