I have a modeling question We have a data model in which mul GoodData #gooddata-platform

I have a modeling question: We have a data model i...

Thomas Karbe

02/24/2024, 9:57 AM

I have a modeling question: We have a data model in which multiple fact tables have references to each other. The form a chain like this:

A -> B -> C

, where

, and

are Fact Tables. Additionally, we have

BE -> B

and

CE -> C

. So far, this works. Now the problem: All 5 of those fact tables should have a relation to the same dimension table

DIM

, but this will now create multiple paths to the same dimension table. Since the references are not complete (not every

points to a

, some don’t have a reference), I can’t simply have the

C -> DIM

and be done with it. My only current solution is to create a copy of

DIM

for every fact table, which makes the workspace very clunky to use. (Users now need to figure from which dimension they want to group / filter). Is there a more elegant way of doing this? Can I simply point all of them to the same

DIM

, even though this gives multiple paths? What’s the consequence of doing so?

✅ 1

Michal Hauzírek

02/25/2024, 2:36 PM

Hi Thomas, I believe that depends if the

DIM

represents exactly the same thing for all the fact tables. It should be OK that not every

points to a

, as long as every

points to the very same value of

DIM

directly, via

and via

BE

And also if for any row of

you can get to the exactly same value of

DIM

while joining by any of the alternative paths etc. The thing with alternative paths is, that the system just assumes all are equally valid and will give you the same results, so it can choose ANY of them for your calculation. And issues can happen if this is not the case. You have little to no ways how to force the system to choose a specific path you want. If I try to put it into some business context for my better understanding, if for example:

customer

order

made by that customer

order line item

being part of that order

BE

sales person

handling the order

CE

product

ordered And if we have a scenario where your system is a web-shop platform which you run as a service and provide to different your clients to sell to their customers, Then if

DIM

represents a

client

(=someone running instance of your system, selling their products to their customers), you could have

DIM

it as a single dimension connected via alternative paths. In this case every single order, line item, product will always belongs to the same

client

. But if the

DIM

would be for example`CreatedBy` (=a user who created that particular object), that would not work. In this situation, the

order

can be created by someone else than the

product

in this order. Or the customer who ordered it. In this scenario the only correct way would be to duplicate the

DIM

and ideally also name the copies properly (i.e.

OrderCreatedBy

, `CustomerCreatedBy`…) so users know i.e by what are they filtering.

Thomas Karbe

02/25/2024, 2:58 PM

Hi Michal, this is very helpful. We indeed have the case that the DIM values are consistent along the different paths. That simplifies the model significantly!

Michal Hauzírek

02/25/2024, 3:07 PM

Glad to hear that. In GoodData Platform, you can always use the workspace validation to check if you don’t have any inconsistent paths. If you do it would be complaining about “transitive reference” or something like that. Also please note that if you ever wanted to use the many-to-many relationships in your model (the two-way arrows), there are additional limitations and these can not be present in sections with the alternative paths (even if the alternative paths are otherwise valid).

👍 1

Thomas Karbe

02/25/2024, 3:07 PM

I’m still trying to avoid M:N wherever I can 😉

👍 1

Thomas Karbe

02/26/2024, 7:14 AM

@Michal Hauzírek one add on question: in large data models, does this cone at a performance cost or would this bring a speedup?

Michal Hauzírek

02/26/2024, 3:07 PM

I would guess if you want to show only data from C and only filter it by something from DIM, it might be faster if you join them directly than via A and B (especially if the A,B,C are large tables).

Open in Slack

Previous Next