LDM Series

Logical Data Model - Basic Rules of Data Modelling

  • 19 October 2020
  • 0 replies
  • 485 views

Userlevel 1

Basic Rules of Data Modeling in GoodData

 

We are continuing in our series of articles about Logical Data Model in GoodData, in this article you will learn basic rules for creating a viable Logical Data Model in GoodData.

If you have not done it yet, you should also read the previous articles in this series, starting with: Logical Data Model - Introduction, and continue to Logical Data Model - Objects of the Logical Data Model.

 

Modeling data in GoodData is easy, you just need to follow five simple rules:

  1. Model based on your reporting requirements

  2. You can filter and slice against the direction of the arrows

  3. If an Attribute in the same role belongs into multiple datasets, make it a shared dataset

  4. References are mandatory

  5. Do not create multiple paths between datasets

 

Let’s discuss each of these rules now:

  1. Model based on your reporting requirements

Before jumping into the modeling, you should know your data and you should have an idea about what kinds of reports you want to build on top of it.  We are not referring to whether you will use bar charts or pie charts, but rather what values (measures) you want to display, by what you want to display them and how you want to filter. There is no need to go to exact detail and describe every single insight, use the entities (i.e. I want to be able to see orders by customer, sales date and shipment date).

Sometimes this approach is called Dimensional Modeling.

 

  1. You can filter and slice against the direction of arrows

When you are looking at the LDM in GoodData the direction of the arrows (references) is very important.  When building the metrics and insights later on, you will be able to filter by anything which is either in the same dataset OR in a dataset which is connected against the direction of the arrow.

o2t2E_8QK9nRGvLGCq9JvdscoI3lITuEg2w3zNy7Bn5E4AKvgtxpMjROhoh2HUYe77YV23DMJAkXakGNAS_guyEaCTDHOacAp5n-0IPTXtUvYFoLFAfAU5yTJrAonVWg3IKAudg

So in our sample model above, you will be able to aggregate “Items Sold” by any of the attributes in the data model. That is because “Product” is in the dataset Product which is against the direction of the arrow from “Order Line” where “Items Sold” exists. 

And also, all the other attributes (including both date dimensions) are connected to the “Order Line” and we can get to them from there by moving against the direction of the arrow. Even the “Tag” is connected against the direction of the arrow (because the M:N arrow between “Product” and “Product Tag” is bi-directional).

On the other hand the “Shipping costs” which exist in the “Order” dataset will NOT be possible to filter by “Product” or “Product Type” or “Tag”. That is because while the dataset “Product” is somehow (through Order Line) connected to “Order”, we can not get there by going against the direction of the arrow.

 

It also does not make sense logically, because shipping cost in our example is associated with the whole order and an order can consist of multiple different products. (Note - there are ways in MAQL of making an advanced calculation which allows you to allocate part of the shipping costs to each of the Order Lines, but let’s not complicate things for now).

 

So create your Logical Data Models in a way that will allow you to filter and slice the values as you want in your insights and dashboards.

 

  1. If the same Attribute belongs into multiple datasets, make a shared dataset for it

This rule means that if you want to use some attribute (or group of attributes) to filter or slice more than one dataset, you need to create a shared dataset which will be referenced by them.

For example, if we want to add information about sales quota for each Sales Manager, we reuse the existing “Sales Manager” attribute. It is already in its own dataset so it is easy to connect it to the new “Sales Quota” dataset. And it works the same way with dates. We want to have monthly sales quota for each Sales Manager - the quota will be evaluated by sales date, so we connect the existing “Sales Date” to “Sales Quota”:

3FurDKIaqrAfcGmVIi6ER7i_b5ic_HVNB1MJRxoG46Cy0T6WD0F4IQJgMP7oOLx0r9cqfFq_t0j32LmAqD5nsFuee2Kx1aHNBRhxcXZ70kIYwdo9wkFJucGdUtRAuwF5zPKbTE4

Now because “Sales Quota” (directly) and “Order Line” (via Order) are both connected to “Sales Manager” and “Sales Date” we will be able to compare revenue each Sales Manager achieved in a given month with his quota for that month.

  1. References are mandatory

Keep in mind that in GoodData if there is a reference, it is mandatory. In our example, when we have reference to “Sales Manager” in the “Order” dataset (meaning the person who made the sale), each order needs to have exactly one value of Sales Manager and this value needs to exist in the Sales Manager dataset. This does not mean that each Sales Manager needs to have some order though.

If Sales Manager would not be mandatory for an Order, and there were some deals without a Sales Manager assigned, it would be necessary to add an artificial record i.e. “(no sales manager)” into the Sales Manager dataset. 

  1. Do not create multiple different paths between datasets

In GoodData each reference needs to be unambiguous. So avoid references which can bring ambiguity. For instance, in the following example, we wanted to capture a business decision to assign each “Product” a dedicated Sales Manager who “owns” it. 

But with this setup there are two possible ways the “Order Line” is connected to “Sales Manager”. And each means something different:

  • The first is the original one - who sold this particular order

  • The second is now the “owner” of the product that was sold (no matter who sold it)

 

ev2p9cWLjDDia0zIrOMZFg3qXluGBjDhdxsEn5iKQvx88inwgg5D4K71pSzjKYqR9pMZKy6-CiStaDUvdacV_o00_av_qn8UzXRHP6jZhkeYSxLownIouiOTZncTCzfwIkCMUYc

So this is NOT the correct way to model things in GoodData. If done in this way, you may get incorrect or unexpected results, because GoodData expects all the paths to have the same meaning. 

If we want to capture both roles the Sales Manager can have, we need to use a different approach. For example, the one where we will have a separate attribute for the Sales Manager who sold the order and the one who owns the Product.

HnjE20uRaP-utD-KeOmzLziaiuaZk9QQa7DRfOZM_j_Tz2TU3cYRdk6iDPH--4f36tk5SGTRI-XYhpEpsv3x0-wMljYTMPuMPoT1kEgPebf_v46fm8ZH2Zl6bcAHD0jLy6Q2BXQ

 

 

That’s it. If you follow these simple five rules, your data models will work as a charm and you will unlock a lot of possibilities for ad-hoc reporting both for you and your customers.

 

Next article in this series is: Logical Data Model - Five Pro-Tips for Data Modelling

Let us know how you like this series and feel free to post questions if something is not clear to you. For more information and practice about data modelling, please also attend our free courses at GoodData university.


0 replies

Be the first to reply!

Reply