And one more question: In our datamodel we have a ...
# gooddata-platform
t
And one more question: In our datamodel we have a few too many dimensions. We still want to use them all, but ideally merge them, so that our fact tables need to link to less dimensions. How should we approach this if we don’t want to destroy lots of insights building on the current set of dimensions? Is there a recommended way to those kind of refactorings under the hood?
m
Hi Thomas, could you please specify what kind of dimensions you are referring to? Generally speaking, it is not in the scope of our solution to allow data manipulation, therefore, any changes on your data need to be done from the data source.
t
The screenshot shows an example. Those two dimensions are both very small, both in the number of rows and columns. I’d like to transform that into one dimension “Dim Employees” that has a new primary key and all combinations of AE + CSM. The Fact tables then only need to reference one of them and thus, become slimmer. Now, I am fine with doing this on the data source level. That is actually quite easy to do. My question is just: If I introduce the new dimension and delete the old two dimensions, all the insights that are currently using them will be lost. Is there a way how I can create a new dimension and then move those attributes over from the two old dimensions to the new joint dimension?
m
I see, thank you for the example! In this case, it will be necessary to make the adjustments from your source data, currently it is not possible to achieve this in GoodData Platform, however, in GoodData Cloud, you can use a so called "SQL Dataset" to create a new dataset via the required SQL statement to merge the columns.
One note of caution, if you decide to change the LDM structure, this will introduce breaking changes and your insights/metrics may need to be recreated; I recommend trying any changes first on a cloned workspace.
With this said, I would like to highlight this use case to our developers as a product feedback if you agree.
t
Sure, please forward this!
1
p
🎉 New note created.
m
Hi Thomas, if I understand you correctly, you would like to re-map the existing attributes to columns in another physical table in your database without losing the existing insights and metrics, is that correct? You can actually move existing attribute to another dataset (without recreating it). There is this option in the “three dots” menu next to the attribute in the modeller. Just note that you can not move a current primary key this way. So I believe this might be a way to go (but please try it first on some clone of your workspace, not on production). • add a new dataset “dim employees” to the model and connect it to your fact tables • remove references (arrows) from “dim aes” and “dim csms” • move attributes from “dim aes” and “dim csms” to “dim employees” • map these attributes correctly to columns in your physical table under “dim employees” While the attributes will still have their original identifiers, it should not drop your insights and other stuff. If you map them correctly to the new table (and the new table is logically/structurally compatible) everything should work as before. if you rely on explicit mapping of columns, it should be fine. If you happen to rely on the naming convention for mapping (a__field…), you would need to adjust it a bit. This approach should work well I believe, but as it is a change in the LDM, please test it first on some development/clone workspace.
t
Hi and thanks! That was exactly the kind of information I was looking for!
I’ll give that a try!
m
Also, if you are doing it for performance reasons, unless you have very many such dimensions, it might not bring that much gain for computation and maybe a little for upload. Unless you have many many columns and tens of millions of records, I would not expect much change. In GD Platform where you physically load the data to gooddata, it hasbslightly different storage underneath and during calculations, it aggregates by internal keys and only as a last step adds the lookup of the values. So in many cases if you are aggregating/filtering by the dimension primery key, the join is not needed at all. Because the same key (as a foreign key) is in the fact table by which it aggregates. What might help a bit for calculation performance is if you have key-name dimensions and filter/view by the name a lot, make the name a label of the key instead. Then the join is eliminated. Unfortunatelly this does not have a smooth way how to convert attribute to a label. For some large workspaces the best way to improve the performance (unless there are some obvious model bottlenecks) might be to convert them to high performance back-end (different underlying database). It might be for additional costs based on the contract and not every workspace is suitable.
t
The reason is actually not so much performance, but simply the size of the workspace. We’re over the limits with it already, merging 3 dimensions away into one might remove 3 columns of every huge fact table we have. Assuming that most of the size actually comes from the fact tables, this should bring some gains. Is that assumption correct?
m
I see. On the other hand if instead of 5 dimensions with 3 values each you will have one with 3x3x3x3x3 values (all combinations), it might be more data in the end. But if it is referenced from a dataset with 50M rows, having 1x50M foreign key values instead of 5x50M might make some difference.
t
yes that’s the thing. the dimension would for sure grow bigger, but the total dimension would still have relatively minimal values. The fact datasets (each of them) have millions of lines, and thus, getting foreign keys of them will do a lot
👍 1