Curious when we are using multikey join We are using the sam GoodData #gooddata-cloud

Curious, when we are using multikey join. We are u...

Doug Creighton

12/11/2024, 3:45 PM

Curious, when we are using multikey join. We are using the same field as our second key for all dimension tables (our customer id called tenant). So we are ending up with a lot of duplicate columns with that tenant id Question? 1. Should we try to move to one key? 2. If we don't, are there any issues this would cause or is it just how messy it looks in gooddata.

Jakub Sterba

12/11/2024, 3:52 PM

Just an idea: do you display tenant id in any visualizations? Maybe you do not need to expose it to users as attribute in data model in case you use it only for application of workspace data filters for each dataset of data model.

Doug Creighton

12/11/2024, 4:30 PM

Not in client dashboards, but yes in internal dashboards. We have tenant_id, grr_tenant, wdf_filter_table (which are all copies of the same column) grr_tenant -> second join key wdf_filter_id -> We filter by tenant for each workspace We don't need to show all the duplicates of grr_tenant or wdf_filter to anyone. We should show the original tenant_id. What about the questions I asked?

Michael Ullock

12/11/2024, 6:04 PM

Hi Doug, as mentioned in our documentation: https://www.gooddata.com/docs/cloud/model-data/prepare-your-data/#PrepareYourData-RecommendedNamingConventions Duplicate label IDs may be generated for different columns in the same table in the database (and eventually for different labels in the corresponding dataset in the LDM). For example, both

<http://lineitem.ls|lineitem.ls>__customer_id__url

and

<http://lineitem.ls|lineitem.ls>__supplier_id__url

will be transformed into

lineitem.url

. While an LDM with duplicate label IDs can be generated, publishing this LDM will fail. To avoid this situation, make sure that the base label names distinguish from each other, for example: • Use

<http://lineitem.ls|lineitem.ls>__customer_id__customer_url

instead of

<http://lineitem.ls|lineitem.ls>__customer_id__url

so that the label ID is generated as

lineitem.customer_url

. • Use

<http://lineitem.ls|lineitem.ls>__supplier_id__supplier_url

instead of

<http://lineitem.ls|lineitem.ls>__supplier_id__url

so that the label ID is generated as

lineitem.supplier_url

Jakub Sterba

12/11/2024, 6:16 PM

In your example you have two distinct attributes “Grr tenant” defined in different datasets (they will appear as two also in data catalog in Analyze tab). You may consider to create dedicated dataset just with tenant id as primary source of such attribute and reference it from there. This way it will be the same attribute and it will appear in fact dataset only once.

Doug Creighton

12/11/2024, 7:04 PM

@Jakub Sterba we can't do your suggestion because the wdf_filter has to be on every dimension and fact table so that it filters to only the dimensions and facts for that workspace. Remember the wdf_filter is the same as grr_tenant. We just added the grr copt of wdf_filter as thats required.

Doug Creighton

12/11/2024, 7:09 PM

@Michael Ullock why would "`lineitem.ls__customer_id__url` and

<http://lineitem.ls|lineitem.ls>__supplier_id__url

will be transformed into

lineitem.url

.? Is it because you have __ before the url. We never do that. However, I do understand your point the better solution would be to create a duplicate grr__tenant_user, grr__tenant_sub_program_. Also I think we didn't put that second __ there for existing grr (not sure what the difference might be)

Doug Creighton

12/11/2024, 7:32 PM

@Michael Ullock what do you mean when you say While an LDM with duplicate label IDs can be generated, publishing this LDM will fail? What does publishing it mean. We currently have duplicates in our LDM and we have moved it around cloned it and push it from one workspace to another with out issue.

Jakub Sterba

12/11/2024, 7:40 PM

In my proposal Tenant Id is present in each and every dataset (as is apparent from the attached diagram) and the same column which is used for attribute mapping or different column in the same dataset can be used for WDF mapping rule.

Doug Creighton

12/11/2024, 7:49 PM

Got it, so your saying if we create a communal ta_ble of the grr__tenant_ids the system will know not to duplicate it?_ I worry about using the wdf_filter as the grr_filter, as one field would be two things.

Jakub Sterba

12/11/2024, 7:50 PM

Yes

✅ 1

Doug Creighton

12/12/2024, 5:43 PM

@Jakub Sterba I got it to work I do get this error. It is expected right?

Jakub Sterba

12/12/2024, 10:23 PM

Yes but I believe it will not cause problems in your case because the tenant_id will be placed in all the other datasets so there will be probably no need to join the dimension dataset at all and chose which path to use.

Jakub Sterba

12/12/2024, 10:23 PM

It is just warning

Doug Creighton

01/02/2025, 3:31 PM

Thanks @Jakub Sterba

3 Views

Open in Slack

Previous Next