Home > Archive > MS SQL Server OLAP > March 2006 > DISTINCT COUNT performance and Snapshot Query?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author DISTINCT COUNT performance and Snapshot Query?
Karl Kvool

2006-03-22, 8:34 pm

Using SQL Server AS 2000...I have read this document probably 1000 times
because I am constantly struggling with performance with counting distinct
customers in large data warehouses, especially when the result set is sliced
by multiple dimensions.

http://msdn.microsoft.com/library/d...l/distinct2.asp

What normally happens with a distinct cube (the normal solution to the
distinct count problem), is the two slices get evaluated separately, and then
added together at the end, which just about doubles the result of the
distinct count measure, and renders it invalid. In this document it explains
how to accomplish this distinct counting using a calculated measure which
cross-joins a measure with a set, but the performance hit is unbearable (5-10
minutes on a very beefy server) if you have more than a couple thousand
records in your fact table. It gets really bad if you slice by ranges of
dimensions (for example, service date range of 6 months and paid date range
of a different 6 months, to determine the number of unique customers that
were serviced and then paid in those two date ranges).

Anyway, at the bottom of the document I reference above, for performance
conderations it talks about how DISTINCT COUNT queries should always be
snapshot queries. But try to find any other reference to what that means
anywhere else in MSDN. What are they talking about? It also talks about
server vs. client side query execution. The only reference I can find for
this is a property on the provider string. However, when I set this to
server-side execution, it didn't make a difference in the response time, and
also it didn't affect which process was using the CPU (which indicated to me
that the client was still resolving the counts).

Does anyone have any more insight into these performance considerations
(snapshot, server-side execution) for DISTINCT COUNT calculations ? I would
welcome any input to help me resolve this issue which has been vexing me for
years.

Thanks!
Karl Kvool

2006-03-23, 8:37 pm

I did solve the remote execution problem by adding two additional values to
my connection string. However, this did nothing to improve the performance
problems. Any ideas or suggestions are greatly appreciated.

"Karl Kvool" wrote:

> Using SQL Server AS 2000...I have read this document probably 1000 times
> because I am constantly struggling with performance with counting distinct
> customers in large data warehouses, especially when the result set is sliced
> by multiple dimensions.
>
> http://msdn.microsoft.com/library/d...l/distinct2.asp
>
> What normally happens with a distinct cube (the normal solution to the
> distinct count problem), is the two slices get evaluated separately, and then
> added together at the end, which just about doubles the result of the
> distinct count measure, and renders it invalid. In this document it explains
> how to accomplish this distinct counting using a calculated measure which
> cross-joins a measure with a set, but the performance hit is unbearable (5-10
> minutes on a very beefy server) if you have more than a couple thousand
> records in your fact table. It gets really bad if you slice by ranges of
> dimensions (for example, service date range of 6 months and paid date range
> of a different 6 months, to determine the number of unique customers that
> were serviced and then paid in those two date ranges).
>
> Anyway, at the bottom of the document I reference above, for performance
> conderations it talks about how DISTINCT COUNT queries should always be
> snapshot queries. But try to find any other reference to what that means
> anywhere else in MSDN. What are they talking about? It also talks about
> server vs. client side query execution. The only reference I can find for
> this is a property on the provider string. However, when I set this to
> server-side execution, it didn't make a difference in the response time, and
> also it didn't affect which process was using the CPU (which indicated to me
> that the client was still resolving the counts).
>
> Does anyone have any more insight into these performance considerations
> (snapshot, server-side execution) for DISTINCT COUNT calculations ? I would
> welcome any input to help me resolve this issue which has been vexing me for
> years.
>
> Thanks!

Deepak Puri

2006-03-25, 3:29 am

Just curious why you wouldn't consider AS 2005, since it natively
supports Distinct Count across selections of multiple members (I assume
that's what you meant by "..the two slices get evaluated separately..",
since the AS 2000 Distinct Count measure can handle single selections on
each of multiple dimensions).


- Deepak

Deepak Puri
Microsoft MVP - SQL Server

*** Sent via Developersdex http://www.droptable.com ***
Karl Kvool

2006-03-28, 1:31 pm

Deepak,

Actually, that will be my ultimate solution if I cannot accomplish this with
AS 2000. However, the front-end that I use does not yet support 2005 and it
is not clear when it will, so I was still holding out hope that there was a
workable way to do this under 2000.

Thanks,

-Karl

"Deepak Puri" wrote:

> Just curious why you wouldn't consider AS 2005, since it natively
> supports Distinct Count across selections of multiple members (I assume
> that's what you meant by "..the two slices get evaluated separately..",
> since the AS 2000 Distinct Count measure can handle single selections on
> each of multiple dimensions).
>
>
> - Deepak
>
> Deepak Puri
> Microsoft MVP - SQL Server
>
> *** Sent via Developersdex http://www.droptable.com ***
>

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com