Drop Table
Support Forum for database administrators and web based access to important newsgroups related to databasesMy table looks like this: char(150) HTTP_REF, char(250) HTTP_USER, char(150) REMOTE_ADDR, char(150) REMOTE_HOST, char(150) URL, smalldatetime TIME_STAMP There are no indexes on this table and there are only 293,658 records total. When I do a select like this it takes forever: SELECT COUNT(DISTINCT REMOTE_ADDR) Takes 2 minutes. Is there anyway to speed that up? Thanks
Post Follow-up to this messageYou can speed up this query with a non-clustered index on REMOTE_ADDR. -- Hope this helps. Dan Guzman SQL Server MVP "Rich" <no@spam.invalid> wrote in message news:DbfFe.48792$4o.35562@fed1read06... > My table looks like this: > char(150) HTTP_REF, > char(250) HTTP_USER, > char(150) REMOTE_ADDR, > char(150) REMOTE_HOST, > char(150) URL, > smalldatetime TIME_STAMP > > There are no indexes on this table and there are only 293,658 records > total. > > When I do a select like this it takes forever: > > SELECT COUNT(DISTINCT REMOTE_ADDR) > > Takes 2 minutes. Is there anyway to speed that up? > > Thanks > >
Post Follow-up to this messageREMOTE_ADDR has duplicate values. So I added a numeric identity column and did this: CREATE UNIQUE NONCLUSTERED INDEX Requests_RemoteAddr ON Requests (REMOTE_ADDR, ID) But that didn't seem to help the speed any. Any other ideas? Thanks. "Dan Guzman" <guzmanda@nospam-online.sbcglobal.net> wrote in message news:09gFe.2034$PC7.1092@newssvr29.news.prodigy.net... > You can speed up this query with a non-clustered index on REMOTE_ADDR. > > -- > Hope this helps. > > Dan Guzman > SQL Server MVP > > "Rich" <no@spam.invalid> wrote in message > news:DbfFe.48792$4o.35562@fed1read06... > >
Post Follow-up to this messagePlease post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. You did not so much as tell us the **name* of this table! There are not indexes? Does your boss -- who is paying you, unlike us who are doing your work for you for free -- give sooooooo little to work with?
Post Follow-up to this messageWhat the heck is DDL??? Didn't I give enough info in my two posts? And what does the name matter? "--CELKO--" <jcelko212@earthlink.net> wrote in message news:1122342953.732318.259740@g44g2000cwa.googlegroups.com... > Please post DDL, so that people do not have to guess what the keys, > constraints, Declarative Referential Integrity, data types, etc. in > your schema are. Sample data is also a good idea, along with clear > specifications. It is very hard to debug code when you do not let us > see it. > > You did not so much as tell us the **name* of this table! There are > not indexes? > > Does your boss -- who is paying you, unlike us who are doing your work > for you for free -- give sooooooo little to work with? >
Post Follow-up to this message> REMOTE_ADDR has duplicate values. So I added a numeric identity column A non-unique index will optimize your COUNT(DISTINCT REMOTE_ADDR). No need to add a unique-ifier identity column. How many distinct values? If it is fairly high, a scan of many index leaf nodes is needed and this might not perform much better than a table scan. In that case, you can create a view with the aggregation and then create an index on that view. For Example CREATE VIEW Requests_RemoteAddr WITH SCHEMABINDING AS SELECT REMOTE_ADDR, COUNT_BIG(*) AS CoungBig FROM dbo.Requests GROUP BY REMOTE_ADDR GO CREATE UNIQUE CLUSTERED INDEX Requests_RemoteAddr_ Index ON WebLog(REMOTE_ADDR) GO Your query can automatically use the view index when accessing the base table if you are using SQL 2000 Enterprise Edition. If you are using another edition, you'll need to access the view and add the NOEXPAND hint: SELECT COUNT(DISTINCT REMOTE_ADDR) FROM dbo.Results_View WITH (NOEXPAND) -- Hope this helps. Dan Guzman SQL Server MVP "Rich" <no@spam.invalid> wrote in message news:IBgFe.48800$4o.29531@fed1read06... > REMOTE_ADDR has duplicate values. So I added a numeric identity column and > did this: > > CREATE UNIQUE NONCLUSTERED INDEX Requests_RemoteAddr ON Requests > (REMOTE_ADDR, ID) > > But that didn't seem to help the speed any. > > Any other ideas? > > Thanks. > > > "Dan Guzman" <guzmanda@nospam-online.sbcglobal.net> wrote in message > news:09gFe.2034$PC7.1092@newssvr29.news.prodigy.net... > >
Post Follow-up to this messageRich (no@spam.invalid) writes: > What the heck is DDL??? > Didn't I give enough info in my two posts? > And what does the name matter? DDL = Data Definition Laguage. That is CREATE TABLE etc. Which it appears that you did include. Too many SQL savvies just throw this funny TLAs(*) around them, and don't give a damn if people they pretend to help understand them or not. (*) TLA = Three-letter abbriviation. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techin.../2000/books.asp
Post Follow-up to this messageRich (no@spam.invalid) writes: > My table looks like this: > char(150) HTTP_REF, > char(250) HTTP_USER, > char(150) REMOTE_ADDR, > char(150) REMOTE_HOST, > char(150) URL, > smalldatetime TIME_STAMP > > There are no indexes on this table and there are only 293,658 records > total. > > When I do a select like this it takes forever: > > SELECT COUNT(DISTINCT REMOTE_ADDR) > > Takes 2 minutes. Is there anyway to speed that up? Assuming that these columns are not really of fixed length, there is quite something to win by changing char to varchar. char is fixed length. Your row size is 854 bytes, which means that you can get at most 9 rows per page. Thus your table requires 32628 pages @ 8192 bytes for a total of 267 MB. Which is a small table. Say that the average length of all columns is 40 bytes. Then your average row size is 5*(40+2) + 4 =214, giving 37 rows per page. This reduces the size of the table to 65 MB. The smaller the pages, the fewer pages to read, and thus the shorter the response time. Adding a non-clustered index as Dan suggested is of course even better because then the table you traverse is only the index. And if you use varchar instead that index is even smaller. Note that I and Dan assume that your query really is SELECT COUNT(DISTINCT REMOTE_ADDR) FROM tbl and nothing else. If there is a WHERE clause involved that refers to other columns, then the index is not that useful. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techin.../2000/books.asp
Post Follow-up to this messageThanks Dan and Erland... it has gone from 2 minutes all the way down to 1 second! But in the process, I accidentally truncated a lot of URLs to 50. OH WELL. At least its faster :) Thanks. "Rich" <no@spam.invalid> wrote in message news:DbfFe.48792$4o.35562@fed1read06... > My table looks like this: > char(150) HTTP_REF, > char(250) HTTP_USER, > char(150) REMOTE_ADDR, > char(150) REMOTE_HOST, > char(150) URL, > smalldatetime TIME_STAMP > > There are no indexes on this table and there are only 293,658 records total. > > When I do a select like this it takes forever: > > SELECT COUNT(DISTINCT REMOTE_ADDR) > > Takes 2 minutes. Is there anyway to speed that up? > > Thanks > >
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread