Home > Archive > MS SQL Data Warehousing > October 2005 > general design question - "multiple groupings"









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author general design question - "multiple groupings"
Ray

2005-10-17, 1:24 pm

Thank you for the suggestion - it makes a lot of sense, and I wish we could
use the approach of adding columns/attributes to describe the product.
Unfortunately, the groupings are defined by the customer -- and we simply
don't know how they may decide to group the products -- there are just too
many possibilities. The product dimension is the list of ALL products that
are sold -- it could be a food item, beverage, or even retail store items.
There are a number of natural hierarchy's we have built in, but we also want
to support arbitrary user defined groups.

In many ways, the groupings can be thought of as filters when performing
(e.g. Sales) analysis. Imagine the scenario where there are 10-100K of
products, and maybe 1000 different groups. A product can belong to more than
one group.

I can think of a few solutions to implement the above:
1) Duplicate the datamart / star schema so that a transaction is duplicated
to each group the product belongs to. This way, aggregation/rollups are
faster.

2) Create a bridge table (many-to-many) relationship -- and use it as a
filter to perform ad-hoc aggregation.

If anyone has more suggestions or thoughts, it would be very much
appreciated.

Thank you,
Ray

"Darren Gosbell" <xxx@xxx.com> wrote in message
news:MPG. 1db8583233d9ae819897
3b@news.microsoft.com...
> Sounds like what you have here are attributes of the dimension members.
> AS2k5 handles attributes REALLY well. In AS2k the story is not as good,
> but can still work, although performance may not be ideal.
>
> In both AS2k and AS2k5, what you would do is to set up your products
> table with a column for each attribute.
>
> eg
>
> ID, Product, Manufacturer, Sugar Content, Color
> 1, Pepsi Max, Pepsi, Sugar Free, Black
>
> Then in AS2k you would create member properties for each of these
> attributes. From these member properties you can then create virtual
> dimensions. AS2k cannot create aggregations of virtual dimension so it
> will have to aggregate at runtime, which is where you may take a
> performance hit.
>
> If you have a natural hierarchy you should set these up as levels as
> these will benefit from the native aggregation abilities of AS. eg. you
> might be able to setup a Manufacturer > Sugar Content > Product
> hierarchy that will also aid users in navigating around.
>
> Alternatively you could create multiple views off this products table
> and create multiple real dimensions off the one product table, but here
> again performance will suffer as the more dimensions there are in a
> cube, the more complicated the aggregation designs become, reducing
> their effectiveness.
>
> In AS2k5 attributes are first class properties of the dimension and
> appear natively as a single level hierarchy. You then have the option of
> combining them into other native hierarchies as you see fit.
>
> HTH
>
> --
> Regards
> Darren Gosbell [MCSD]
> < dgosbell_at_yahoo_do
t_com>
> Blog: http://www.geekswithblogs.net/darrengosbell



Darren Gosbell

2005-10-17, 1:24 pm

It sounds like you are heading towards building a bridge table with
attributes and values in it.

eg.

Product, Attribute, Value
====================
=======
Pepsi Max, Manufacturer, Pepsi
Pepsi Max, Category, Drinks
Pepsi Max, Sugar Content, None

In this case Deepak's suggestion of looking at the many-to-many
dimension feature in AS2k5 is probably the best way to go.


But, just to play Devil's advocate....

I may be wrong (it would not be the first time <grin> ) and it may just
be the simplistic examples we are using, but it still *feels* to me that
we are dealing with attributes here, not a true many-to-many dimension.

Below is a VERY simplistic example of what I would call a true many-to-
many dimension and it may well be that this is what you have. If so you
can ignore the rest of what I am about to say. If not, then I suspect
that it's not that your end user *needs* ad-hoc groupings, it just that
they are not yet sure what their requirements are. This is not uncommon
and I would suspect that over time the attributes would stabilize and at
that point you could probably re-visit the architecture and normalise
some or all of the attributes into the product table, which I suspect
would give a noticeable performance benefit.

This is all conjecture of course as AS2k5 has not been released yet, but
I would be very surprised if multiple attributes did not out perform a
single many to many relationship.

Product, Attribute, Value
====================
=======
Pepsi Max, Package Color, Red
Pepsi Max, Package Color, White
Pepsi Max, Package Color, Blue



--
Regards
Darren Gosbell [MCSD]
< dgosbell_at_yahoo_do
t_com>
Blog: http://www.geekswithblogs.net/darrengosbell
Ray

2005-10-17, 1:24 pm

We have a main fact table that contains, for example, individual retail
transactions (e.g. Coke, Sprite, 7up, Milk, Bottled Water... chocolate cake,
steak.... t-shirt, mug, cap).

Each item can belong to several groups (e.g. group: drinks, non-alcholic
drinks, pepsi products, coke products, sugar-free drink, etc.)

In the above scenario, we see two options for the design of the datamart:
1) create a star schema design where the transaction is "duplicated" into
the fact table -- effectively linking a transaction to a "group" -- e.g. a
sale of Diet-Coke would have 4 entries in the fact table (one for "drinks",
non-alcoholic, coke products, and sugar-free).
The draw-back is that this is ALOT of transactions and can really start to
bloat the size of the datamart.

2) create a "bridge" table that associates items to groups, effectively
implementing a many-to-many relationship.

Does anyone have other options or opinions on how to effectively design
this? The requirement is fairly open-ended -- the customer wants to be able
to view this data in as fast a time as possible (i.e. doing an ad-hoc query
might be too slow of a roll-up)

Thanks,
Ray


Darren Gosbell

2005-10-17, 1:24 pm

Sounds like what you have here are attributes of the dimension members.
AS2k5 handles attributes REALLY well. In AS2k the story is not as good,
but can still work, although performance may not be ideal.

In both AS2k and AS2k5, what you would do is to set up your products
table with a column for each attribute.

eg

ID, Product, Manufacturer, Sugar Content, Color
1, Pepsi Max, Pepsi, Sugar Free, Black

Then in AS2k you would create member properties for each of these
attributes. From these member properties you can then create virtual
dimensions. AS2k cannot create aggregations of virtual dimension so it
will have to aggregate at runtime, which is where you may take a
performance hit.

If you have a natural hierarchy you should set these up as levels as
these will benefit from the native aggregation abilities of AS. eg. you
might be able to setup a Manufacturer > Sugar Content > Product
hierarchy that will also aid users in navigating around.

Alternatively you could create multiple views off this products table
and create multiple real dimensions off the one product table, but here
again performance will suffer as the more dimensions there are in a
cube, the more complicated the aggregation designs become, reducing
their effectiveness.

In AS2k5 attributes are first class properties of the dimension and
appear natively as a single level hierarchy. You then have the option of
combining them into other native hierarchies as you see fit.

HTH

--
Regards
Darren Gosbell [MCSD]
< dgosbell_at_yahoo_do
t_com>
Blog: http://www.geekswithblogs.net/darrengosbell
Ray

2005-10-17, 1:24 pm

Thank you to all who have replied.

In response to Darren's comment:
> can ignore the rest of what I am about to say. If not, then I suspect
> that it's not that your end user *needs* ad-hoc groupings, it just that
> they are not yet sure what their requirements are. This is not uncommon


You've hit the nail on the head -- ideally, if we could know what all the
possible classifications, we could build it into the dimension table as
column attributes. In the world of all possible food and retail sales, it is
difficult for us (and the client!) to know upfront how they want to group
the products.

Also, we suspect there will always be the need for arbitrary groupings of
products. e.g. imagine a manager of a store wanting to know how sales of
products at the front of the store compare with sales at the back of the
store. (S/he could use this information for product placement in a store).

Ray



roman.domin@adastra.cz

2005-10-20, 8:24 pm

Hi Ray,

here is my experience with "multiple grouping" implementation:

We applied this for international retailer and we used the solution
which you described as 2nd method (the bridge table). These groups the
retailer called Assortments and they had thousands of them. We prepared
bridge fact table with 2 dimensions (Articles, Assortments) and we used
calculations that changed the measures approximatelly like this:

Iif(
[Assortments].CurrentMember.Level.Ordinal = 0,
([Measures Advanced].[Standard]),
Aggregate(
Extract(
Filter(
Descendants([Articles].CurrentMember,
[Articles].[Article]),
Not IsEmpty([Measures].[Articles Count])
),
[Articles]
),
ValidMeasure(([Measures Advanced].[Standard]))
),
)

the [Measures Advanced] is our special dimension that we used fro
solving many different things. In this case it allowed us creating
calculations such as Apply Assortments, Ignore Assortments, Assortments
Shares to articles structure, etc (e.g. Share of Private Labels on
Drinks, etc.).

As you expect the performance is an issue - when users use All level in
Articles (more than 300000 members) then the first calculation takes
about 1-2 minutes. Facts had hundreds of millions rows per year. When
the users were deeper in Articles dimension or father than inicial
calculation than the performance was much faster. The solution was
acceptable after all.

Looking back now I would choose maybe some combination of 2 methods you
had described. The first one I would use for selected very important
assortments (groups) - special fact table with fact only for members of
these groups (nevertheless the total of this table is not correct) and
then changing calculations with calculated cells technique (when
important assortment is selected than use measure from new cube ...).

Yukon brings easier implementation but we'll see what about performance
....

Good luck

Roman Domin
Senior Consultant

ADASTRA, s.r.o.
CUSTOMER INTELLIGENCE SOLUTIONS
Benesovska 10, 101 00 Praha 10
Tel: +420-271 733 303 Fax: +420-271 735 296 Mob: +420 724 066 111
roman.domin@adastracorp.com http://www.adastra.cz

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com