I have a modeling question: We have a data model i...
# gooddata-platform
t
I have a modeling question: We have a data model in which multiple fact tables have references to each other. The form a chain like this:
A -> B -> C
, where
A
,
B
, and
C
are Fact Tables. Additionally, we have
BE -> B
and
CE -> C
. So far, this works. Now the problem: All 5 of those fact tables should have a relation to the same dimension table
DIM
, but this will now create multiple paths to the same dimension table. Since the references are not complete (not every
A
points to a
B
, some don’t have a reference), I can’t simply have the
C -> DIM
and be done with it. My only current solution is to create a copy of
DIM
for every fact table, which makes the workspace very clunky to use. (Users now need to figure from which dimension they want to group / filter). Is there a more elegant way of doing this? Can I simply point all of them to the same
DIM
, even though this gives multiple paths? What’s the consequence of doing so?
1
m
Hi Thomas, I believe that depends if the
DIM
represents exactly the same thing for all the fact tables. It should be OK that not every
A
points to a
B
, as long as every
B
points to the very same value of
DIM
directly, via
A
and via
BE
And also if for any row of
C
you can get to the exactly same value of
DIM
while joining by any of the alternative paths etc. The thing with alternative paths is, that the system just assumes all are equally valid and will give you the same results, so it can choose ANY of them for your calculation. And issues can happen if this is not the case. You have little to no ways how to force the system to choose a specific path you want. If I try to put it into some business context for my better understanding, if for example:
A
=
customer
B
=
order
made by that customer
C
=
order line item
being part of that order
BE
=
sales person
handling the order
CE
=
product
ordered And if we have a scenario where your system is a web-shop platform which you run as a service and provide to different your clients to sell to their customers, Then if
DIM
represents a
client
(=someone running instance of your system, selling their products to their customers), you could have
DIM
it as a single dimension connected via alternative paths. In this case every single order, line item, product will always belongs to the same
client
. But if the
DIM
would be for example`CreatedBy` (=a user who created that particular object), that would not work. In this situation, the
order
can be created by someone else than the
product
in this order. Or the customer who ordered it. In this scenario the only correct way would be to duplicate the
DIM
and ideally also name the copies properly (i.e.
OrderCreatedBy
, `CustomerCreatedBy`…) so users know i.e by what are they filtering.
t
Hi Michal, this is very helpful. We indeed have the case that the DIM values are consistent along the different paths. That simplifies the model significantly!
m
Glad to hear that. In GoodData Platform, you can always use the workspace validation to check if you don’t have any inconsistent paths. If you do it would be complaining about “transitive reference” or something like that. Also please note that if you ever wanted to use the many-to-many relationships in your model (the two-way arrows), there are additional limitations and these can not be present in sections with the alternative paths (even if the alternative paths are otherwise valid).
👍 1
t
I’m still trying to avoid M:N wherever I can 😉
👍 1
@Michal Hauzírek one add on question: in large data models, does this cone at a performance cost or would this bring a speedup?
m
I would guess if you want to show only data from C and only filter it by something from DIM, it might be faster if you join them directly than via A and B (especially if the A,B,C are large tables).