Curious, when we are using multikey join. We are u...
# gooddata-cloud
d
Curious, when we are using multikey join. We are using the same field as our second key for all dimension tables (our customer id called tenant). So we are ending up with a lot of duplicate columns with that tenant id Question? 1. Should we try to move to one key? 2. If we don't, are there any issues this would cause or is it just how messy it looks in gooddata.
j
Just an idea: do you display tenant id in any visualizations? Maybe you do not need to expose it to users as attribute in data model in case you use it only for application of workspace data filters for each dataset of data model.
d
Not in client dashboards, but yes in internal dashboards. We have tenant_id, grr_tenant, wdf_filter_table (which are all copies of the same column) grr_tenant -> second join key wdf_filter_id -> We filter by tenant for each workspace We don't need to show all the duplicates of grr_tenant or wdf_filter to anyone. We should show the original tenant_id. What about the questions I asked?
m
Hi Doug, as mentioned in our documentation: https://www.gooddata.com/docs/cloud/model-data/prepare-your-data/#PrepareYourData-RecommendedNamingConventions Duplicate label IDs may be generated for different columns in the same table in the database (and eventually for different labels in the corresponding dataset in the LDM). For example, both
<http://lineitem.ls|lineitem.ls>__customer_id__url
and
<http://lineitem.ls|lineitem.ls>__supplier_id__url
will be transformed into
lineitem.url
. While an LDM with duplicate label IDs can be generated, publishing this LDM will fail. To avoid this situation, make sure that the base label names distinguish from each other, for example: • Use
<http://lineitem.ls|lineitem.ls>__customer_id__customer_url
instead of
<http://lineitem.ls|lineitem.ls>__customer_id__url
so that the label ID is generated as
lineitem.customer_url
. • Use
<http://lineitem.ls|lineitem.ls>__supplier_id__supplier_url
instead of
<http://lineitem.ls|lineitem.ls>__supplier_id__url
so that the label ID is generated as
lineitem.supplier_url
.
j
In your example you have two distinct attributes “Grr tenant” defined in different datasets (they will appear as two also in data catalog in Analyze tab). You may consider to create dedicated dataset just with tenant id as primary source of such attribute and reference it from there. This way it will be the same attribute and it will appear in fact dataset only once.
d
@Jakub Sterba we can't do your suggestion because the wdf_filter has to be on every dimension and fact table so that it filters to only the dimensions and facts for that workspace. Remember the wdf_filter is the same as grr_tenant. We just added the grr copt of wdf_filter as thats required.
@Michael Ullock why would "`lineitem.ls__customer_id__url` and
<http://lineitem.ls|lineitem.ls>__supplier_id__url
will be transformed into
lineitem.url
.? Is it because you have __ before the url. We never do that. However, I do understand your point the better solution would be to create a duplicate grr__tenant_user, grr__tenant_sub_program_. Also I think we didn't put that second __ there for existing grr (not sure what the difference might be)
@Michael Ullock what do you mean when you say While an LDM with duplicate label IDs can be generated, publishing this LDM will fail? What does publishing it mean. We currently have duplicates in our LDM and we have moved it around cloned it and push it from one workspace to another with out issue.
j
In my proposal Tenant Id is present in each and every dataset (as is apparent from the attached diagram) and the same column which is used for attribute mapping or different column in the same dataset can be used for WDF mapping rule.
d
Got it, so your saying if we create a communal ta_ble of the grr__tenant_ids the system will know not to duplicate it?_ I worry about using the wdf_filter as the grr_filter, as one field would be two things.
j
Yes
1
d
@Jakub Sterba I got it to work I do get this error. It is expected right?
j
Yes but I believe it will not cause problems in your case because the tenant_id will be placed in all the other datasets so there will be probably no need to join the dimension dataset at all and chose which path to use.
It is just warning
d
Thanks @Jakub Sterba