Good morning everyone! We have a question regardi...
# gooddata-cn
s
Good morning everyone! We have a question regarding the way to model a many-to-many relationship. We will try to explain our situation as well as possible; Our data model has devices that can exist within multiple zones within establishments. The way we model that is by using a join table (see provided screenshot). Importing this model and playing around with it in your model editor, we have been unable to get the warning it shows (
Alternative paths found in model
) to go away. It's obvious it shows that for a situation like in the screenshot, but no matter how we define the different relationships, the warning stays. We have looked at the following documentation on your website. We have also tried ignoring the warning, but then. Statistics, that are from a certain device, can not be filtered on the zone that device is in. So it clearly isn't working, like the warning would suggest. We hope you people can help us, thank you in advance! Best regards, Stijn Habekott*é*
t
Hello @Stijn Habekotté, based on your requirements and screenshot, please try the following changes. Please note that maybe it does not perfectly cover all your use cases, but I think it is worth to try. 1. Add logical M:N relationship between device and device zone. Drag the blue circle on Device zone back to devices. The connection line between the datasets should change to a dashed one with arrows pointing both directions. 2. Remove the connection between Devices and Establishment. This will remove the alternative paths found warning in the LDM, while allowing you to slice Devices by establishment name through the new path (Devices > Devices zone > Zones > Establishment). Please check if this fulfils your analytics needs and there are no validation issues in the LDM anymore.
s
Maybe I wasn't clear, sorry. But our PDM has a foreign key for the establishment id, filtering on establishment is the one of our most important requirements. A device does not necessarily have a zone, but it can have more zones. However a device always needs to have an establishment.
t
I see
s
Removing the relationship between establishment and device would remove that functionality
t
What does the relation between Establishment and Zones mean? Could device’s zone be different than establishment’s zone?
s
Yes, they are different, the zones are part of an establishment, we do not want a device to always have an zone, because they already have an establishment.
t
If they are different, then they should not lead to the same dataset. The analytics engine would not know which Zone to take in case there is one assigned through the Establishment and another through Devices zone. That is why the validation says “alternative paths found in the model”, meaning there are multiple ways how to join the underlying tables to get the result. If you want to keep the abstraction of Establishment zones and Devices zones, then you should probably create two separated Zones datasets and connect one to Establishment and other to the Devices. Note that GD currently does not support to map multiple logical datasets into the same DB table, so you would need to create a view on top of the original table, to have it twice in the PDM.
Anyway if a device can have multiple zones and zone can have multiple devices (there is M:N) relationship between them, then you should add the logical M:N connection between Devices and Devices zone dataset as described in my first comment. It will allow you to use M:N capabilities in the analytics.
s
Thank you for your message. I believe I have misunderstood the question. Maybe a description of the domain helps a bit here. An establishment is a library (that users go to borrow books). An example of a device is a security gate, counting people that pass through, A zone is purely a logical concept. In example, we could make a whole floor in the library (i.e. 1st floor, 2nd floor) a zone. Another example is that we create zones for departments. i.e. a 'Fiction' department, or a 'History' department. Zones overlap, therefore a device such as a people counter can be part of multiple zones. However, all these zones are part of a single establishment. The physical location where a device is present. The usecase is as follows. We want to see the collected statistics per establishment. However, optionally, we want to be able to further filter by appliing a set of zones, thereby removing devices that are not part of that zone. You can think of zones as a tag. Only, the tags are defined at establishment level. Therefore, a device is IN a estsablishment (and therefore has an establishment ID). It can also be in zero, or multiple zones. Imagine that we only have the need for a zone on the History section. All other devices in this establishment are not part of a zone. So, zones are a filtering concept. In our database, we have 3 tables. The devices, the establishments, and the zones. It is a postgresql database. Because we want devices to opt-in to multiple zones, we decided to use the classic SQL approach: a join table linking devices to zones. GoodData allows us to make a M:N relationship from Device to Zone. In the logical model, that makes sense. However, if we do that, we are struggling with the data mapping to the database where we are unaware of another approach next to the join table. If we remove the link between device and establishment, a device MUST be part of a zone. However, that is not mandatory. We could remove the link between Zone and Establishment. As, we can still find the zones trough the path Establishment>Device>Zone. However, we do not understand the many to many in the logical model. Our data model has the join table. However, logically, Devices have a M:N relationship to Zones, see the image in the attachment. So, in this case, what should have been the correct approach? Thank you a lot for your assistance!
j
cc: @Jakub Sterba @Ondrej Stumpf guys, can you help Stijn? @Stijn Habekotté thanks a lot for such a detail description of the user case!
t
Let me explain a little bit further into the M:N. In GD, you would create a M:N relationship the same way as you do in the SQL using a joint (bridge) table. But to allow M:N in analytics, you should also create a “logical” M:N relationship. This will allow to slice/filter Devices by zones. Notice that adding this kind of relationship between Devices and Zones did not influence the physical data model. It adds some metadata to the logical data model, so it knows it can slice the “opposite” direction than it usually does. So in GD, the M:N is a combination of a bridge table and M:N relationship line in the logical data model.
j
Hi, let me shed some light on the problem in the original model in the first message of this thread. The model contains a loop which introduces ambiguity. There are alternative paths from Establishment to Devices zone (one through Devices dataset and one through Zones). Warning is displayed because query engine may select any path of the two paths to join the tables of physical model . For example metric
select count({attribute/devices_zone_id}) where {attribute/establishment_name}="XY"
. may either count all devices_zone_ids for devices in given establishment or zones in given establishment. In hypothetical case zone in the bridge dataset (Devices zone) may relate to different establishment than device in the same dataset which may lead to inconsistent results. This is the reason why the warning is displayed. Zones in this model cannot be used for filtering of fact datasets connected to Devices dimension dataset because there is not oriented path from Zones to Devices. M:N edge from Devices zone bridge dataset to Devices dataset would enable it but such edge is not allowed due to some restrictions which are documented in https://help.gooddata.com/doc/enterprise/en/data-integration/data-modeling-in-gooddata/many-to-many-in-logical-data-models (see Example 2). This restriction is in place for domains ending with on.gooddata.com . For GoodData CN (self hosted) and new GoodData Cloud offering (domain ending with cloud.gooddata.com slightly different restrictions are in place which are documented here: https://www.gooddata.com/developers/cloud-native/doc/hosted/model-data/evolve-ldm/many-to-many-in-ldm/#limitations-to-mn . I think the GD.CN and GD.Cloud is more favourable for modeling of the data as you describe. Can you confirm which platform are you using?
I suggest to break the loop and keep only one way of holding information about membership of device to establishment. This will make the model unambiguous. My understanding is that zone is optional so I suggest to remove relation between the zone and establishment. The zone would act as optional multi-value attribute of device.
The model above will let you calculate metrics from facts in Fact dataset and filter them by Establishment, Device and Zones. If you need to count/list zones of given establishment you can add many-to-many edge also between Zones and Devices Zone dataset.
You may consider split Establishment into two dimensions (Establishment of device and Establishment of zone) if none of the proposed solutions works for you and use different attributes for filtering devices and zones.
In case you are using the domain ending with on.gooddata.com , you may try to apply for trial of GoodData.Cloud at https://www.gooddata.com/trial/ and try it there.
s
Hi everyone, thank you very much for your assistance. We tried to follow the recommendations written by Jakub, after reading the all the posts. The first image shows the model. However, when we add the Establishment entity, we get the following error when trying to publish. Without this dataset, we can save the datamodel.
Copy code
"content=ldm status=400 error={\"detail\":\"dataset=devices is referenced by anotherDataset=devices_zone with multivalue reference and has its own references\",\"status\":400,\"title\":\"Bad Request\",\"traceId\":\"34b2f555f937de49\"}"
At the moment we are using the community edition Docker conteiner v2.0.0
o
Hi @Stijn Habekotté, the model validator seeks patterns when a dataset is pointed to using an M:N edge (like
devices
) and at the same time, the same dataset is pointed to by any other dataset (in our case, the
establishment
). To make the error go away, I suggest to make the M:N edge point to
zones
instead of
devices
, see the attached model. If you combine this with the approach @Jakub Sterba suggested (i.e. split
establishment
into
establishment of device
and
establishment of zone
), hopefully it should cover your use cases.
s
Hi Ondrej, We don't fully understand how we should do that split. However, we have decided to simplify our model, so that we have no circular paths. However, we still can't anage to do a many to many relation. This feels like a basic m:n relationship. Any ideas?
image.png
o
The model ^ repeats the same problematic pattern:
devices
is referenced using M:N edge from
device_zone
and at the same time,
devices
references
devicetypes
. If you remove the edge between
devicetypes
and
devices
, it will work. By "splitting", @Jakub Sterba meant creating two distinct datasets, one linked with
devices
, the other with
zones
.
To safely get rid of
devicetypes
, you could inline it into
devices
, i.e. add a new attribute to
devices
with the name
type
. Anyway, you managed to start quite a big internal discussion here at GoodData 🙂. I'm now creating an internal ticket to investigate the issue more deeply and perhaps relax the validations.
s
Hi Ondrej, we decided to go with thad advice and have added a devicetypename and devicetypeid to the device table. For now it seems to work. We are curious to what the outcome of your internal discussion will be. 🙂
a
Thank you all for assisting us 🙂
j
This was amazing discussion even for internal GoodData people 😉
s
Hey everyone, I would like to draw your attention to my post from earlier today. Thank you in advance! https://gooddataconnect.slack.com/archives/C01USCF4S10/p1659452635006769