Home > Archive > MS SQL Server OLAP > March 2006 > 2000 vs. 2005 OLAP hierarchies question









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author 2000 vs. 2005 OLAP hierarchies question
Luke Schollmeyer

2006-03-29, 9:30 am

I'm a bit of an OLAP newbie (although I've been around the SQL Server
game for about 8 years), so if you answer makes sure you type slowly
since I can't read very fast...

Here are the basic facts: Brand new, not yet deployed data warehouse on
SQL Server 2005. Have leveraged Kimball methods and Project REAL
suggestions for best practices in dimensional model, ETL, cube and
physical deployments. I can provide more information if relevant.

Question: We will be using Proclarity for our BI tool. The consultant
helping to set up the Proclarity tools indicated that hierarchal data
sets can be problematic in the 2005 cube because AS doesn't create a
concatenated key. For example, with the Geography dimension, the
hierarchal set is Country --> State --> City. The problem, as explained
to me, is that when the cube is being processed, cities with the same
name are rolled up into the first matching state. Thus, take the city
name of "Paris." There are nine different cities named Paris in the
U.S. Ostensibly, all of the Parises might be aggregated under Arkansas.
The solution, again, explained to me, is to have a unique city key
to help when building the cube.

We're not opposed to augmenting our schema and ETL for this, but I
wanted to get some insight to this particular behavior. Your input is
appreciated.

Luke
yongli

2006-03-29, 1:30 pm

There are some big changes from AS2000 to AS2005 in dimension world. In
AS2000, dimension was more focus on hierarchy and level, while AS2005 is more
attribute based, then user hierarchy is built from the attributes.

In your case, you would need to build three attributes county, state and
city first, then assemble your hierarchy from those three attributes. Thus
each member of attributes (especially city ) must have unique key (name) in
order to make sense.

(Imagine if you browse city attribute, which paris is it?)


"Luke Schollmeyer" wrote:

> I'm a bit of an OLAP newbie (although I've been around the SQL Server
> game for about 8 years), so if you answer makes sure you type slowly
> since I can't read very fast...
>
> Here are the basic facts: Brand new, not yet deployed data warehouse on
> SQL Server 2005. Have leveraged Kimball methods and Project REAL
> suggestions for best practices in dimensional model, ETL, cube and
> physical deployments. I can provide more information if relevant.
>
> Question: We will be using Proclarity for our BI tool. The consultant
> helping to set up the Proclarity tools indicated that hierarchal data
> sets can be problematic in the 2005 cube because AS doesn't create a
> concatenated key. For example, with the Geography dimension, the
> hierarchal set is Country --> State --> City. The problem, as explained
> to me, is that when the cube is being processed, cities with the same
> name are rolled up into the first matching state. Thus, take the city
> name of "Paris." There are nine different cities named Paris in the
> U.S. Ostensibly, all of the Parises might be aggregated under Arkansas.
> The solution, again, explained to me, is to have a unique city key
> to help when building the cube.
>
> We're not opposed to augmenting our schema and ETL for this, but I
> wanted to get some insight to this particular behavior. Your input is
> appreciated.
>
> Luke
>

Akshai Mirchandani [MS]

2006-03-29, 8:27 pm

Its actually very simple but very important to do this right.

Consider that you have two attributes City and State. Think of them as two
entities. A semi-formal definition of an attribute relationship would be:
- An attribute relationship exists (i.e. can be defined) between City and
State if there is a functional dependency between the keys of City and
State.

What this implies is that if the key column of City is CityID and each
CityID has a unique StateID, then its very clear that a functional
dependency exists between the two attributes. Given a particular CityID, you
can uniquely identify a single StateID.

But if the key column of the City attribute is CityName, then you run into
the situation that there isn't a true functional dependency between the two
attributes -- i.e. given a CityName, you cannot uniquely identify its
StateName. Therefore, you are required in this case to use a composite key
for the City attribute:
(CityName, StateName)
to make sure that you can uniquely identify the State from a particular
instance of a City.

If you had the same uncertainty about Country then the key of City should
really be:
(CityName, StateName, CountryName)


At the same time, let me try to explain two more things:
1. Why define attribute relationships at all?
2. What would happen if you did not use a composite key for the non-unique
case?

1. If you define attribute relationship properly, then it has several
benefits:
- It improves performance while iterating over members in a hierarchy made
up of related attributes.
- If there is an aggregation (either on disk or an in-memory cache) that is
at the granularity of City, the server can automatically use that
aggregation to find out the values at the granularity of State. This makes a
*huge* difference when it comes to large cubes. Instead of going to disk
twice (or not hitting a processed aggregation), the server can re-use the
data at the City granularity and just aggregate the data up to the
granularity of States.
- Strong attribute relationships. I believe there is some information
already posted about this so I won't say much -- but basically: if your
current coordinate on the City attribute is [San Francisco], what do you
expect your current State to be? With an attribute relationship between City
and State, the current State will be [CA]. But without the attribute
relationship, it will be [All States].

2. If you defined the attribute relationships but the key was only CityName,
here is what would happen:

SELECT DISTINCT [CityName], [StateName] FROM [Customers]

CityName StateName
---------------------------
San Francisco California
Seattle Washington
Portland Maine
Portland Oregon <====

For each CityName, the server will insert the key into a hash table. So the
hash table will contain:
San Francisco
Seattle
Portland

Now when it sees the second instance of Portland, it will search the hash
table and find that Portland already exists. So it will either raise an
error if you have turned on the "raise errors on Duplicate keys" error
configuration, or it will simply ignore that row (this is the default).

Assuming it ignores that row, here is what the hierachy will look like:

USA
- California
- San Francisco
- Washington
- Seattle
- Maine
- Portland
- <all the customers in both Portlands will be put into
Portland, Maine!>

As you see, the Portland in Oregon is simply not included in the dimension.
Now the data that was split between customers in Portland, Maine and
Portland, Oregon will actually show up as belonging to Portland, Maine.

HTH,
Akshai
--
This posting is provided "AS IS" with no warranties, and confers no rights
Please do not send email directly to this alias. This alias is for newsgroup
purposes only.

"Luke Schollmeyer" <aegix@yahoo.com> wrote in message
news:%23TC4ZjzUGHA.1304@tk2msftngp13.phx.gbl...
> I'm a bit of an OLAP newbie (although I've been around the SQL Server game
> for about 8 years), so if you answer makes sure you type slowly since I
> can't read very fast...
>
> Here are the basic facts: Brand new, not yet deployed data warehouse on
> SQL Server 2005. Have leveraged Kimball methods and Project REAL
> suggestions for best practices in dimensional model, ETL, cube and
> physical deployments. I can provide more information if relevant.
>
> Question: We will be using Proclarity for our BI tool. The consultant
> helping to set up the Proclarity tools indicated that hierarchal data sets
> can be problematic in the 2005 cube because AS doesn't create a
> concatenated key. For example, with the Geography dimension, the
> hierarchal set is Country --> State --> City. The problem, as explained
> to me, is that when the cube is being processed, cities with the same name
> are rolled up into the first matching state. Thus, take the city name of
> "Paris." There are nine different cities named Paris in the U.S.
> Ostensibly, all of the Parises might be aggregated under Arkansas. The
> solution, again, explained to me, is to have a unique city key to help
> when building the cube.
>
> We're not opposed to augmenting our schema and ETL for this, but I wanted
> to get some insight to this particular behavior. Your input is
> appreciated.
>
> Luke



Dave Wickert [MSFT]

2006-03-29, 8:27 pm

What Akshai is talking about is also described in the Project REAL white
papers.
Go to http://www.microsoft.com/sql/bi/ProjectREAL -- look at the Analysis
Services technical drilldown white paper.
--
Dave Wickert [MSFT]
dwickert@online.microsoft.com
Program Manager
BI Systems Team
SQL BI Product Unit (Analysis Services)
--
This posting is provided "AS IS" with no warranties, and confers no rights.


"Luke Schollmeyer" <aegix@yahoo.com> wrote in message
news:%23TC4ZjzUGHA.1304@tk2msftngp13.phx.gbl...
> I'm a bit of an OLAP newbie (although I've been around the SQL Server game
> for about 8 years), so if you answer makes sure you type slowly since I
> can't read very fast...
>
> Here are the basic facts: Brand new, not yet deployed data warehouse on
> SQL Server 2005. Have leveraged Kimball methods and Project REAL
> suggestions for best practices in dimensional model, ETL, cube and
> physical deployments. I can provide more information if relevant.
>
> Question: We will be using Proclarity for our BI tool. The consultant
> helping to set up the Proclarity tools indicated that hierarchal data sets
> can be problematic in the 2005 cube because AS doesn't create a
> concatenated key. For example, with the Geography dimension, the
> hierarchal set is Country --> State --> City. The problem, as explained
> to me, is that when the cube is being processed, cities with the same name
> are rolled up into the first matching state. Thus, take the city name of
> "Paris." There are nine different cities named Paris in the U.S.
> Ostensibly, all of the Parises might be aggregated under Arkansas. The
> solution, again, explained to me, is to have a unique city key to help
> when building the cube.
>
> We're not opposed to augmenting our schema and ETL for this, but I wanted
> to get some insight to this particular behavior. Your input is
> appreciated.
>
> Luke



Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com