Solved

Using "event" dataset

  • 9 February 2022
  • 9 replies
  • 138 views

  • New Participant
  • 5 replies

Hello,

I found an interesting article on your help site regarding analysis using dates: https://help.gooddata.com/doc/enterprise/en/data-integration/data-modeling-in-gooddata/if-too-many-date-dimensions-try-events
I think it will suite my needs as I need to be able to slice data using dates that have different meaning. The scenario described in the link only presents a case where the event is related to single dataset, in my case this is quite different as the event has to be connected with two datasets (which in the ends happens to create so called “alternative paths”). 

I wanna make sure the model above will be valid or maybe there’s a different approach to the problem that won’t result in alternative paths? Example report I would like to build using those datasets:

Report showing # of projects were in a given phase month by month

Do you think the approach above is OK? I’m pretty sure building right model is really important before I can proceed with data transformation on my side, so I would like to know your thoughts on that.

Thanks in advance!

icon

Best answer by michal 23 May 2022, 19:01

View original

9 replies

Userlevel 1

Greetings,

 

connection between datasets from the screenshot is valid but not ideal. It creates alternative paths and the analytical engine would then randomly selects between the available paths in it's computation. You could use explicit lifting for your analytics if you simplify your LDM.

-Jan

@Jan Kos hey Jan, thanks for answer

Could you maybe provide an example of how could I simplify it to be able to use it with explicit lifting?

Thanks!

Userlevel 1

Hello Niko,

my apologies for delay getting back to you. I was noted you discussed this on office hours. This could work if you remove arrow from Project dataset to Project Phase Event, change orientation of the arrow to be pointing from Project dataset to Project Phase dataset and add primary key to the Project Phase Event. Does this solution works for you?

-Jan

Userlevel 2

Hi Niko,

The problem is with the triangle that has been formed in the LDM. When making reports using the event date, you will only be able to compare things to the left of the dataset. For example, you could move the Project Phase dataset to the left of the Project dataset. Then you could use a connection point in the Project Phase Event dataset. This connection point would be used in a BY Clause to lock the aggregation and make the objects available for reporting. 

Hello Niko,

my apologies for delay getting back to you. I was noted you discussed this on office hours. This could work if you remove arrow from Project dataset to Project Phase Event, change orientation of the arrow to be pointing from Project dataset to Project Phase dataset and add primary key to the Project Phase Event. Does this solution works for you?

-Jan

Hey Jan,

I tried this approach but I’m not sure if it’s what I wanted. Basically Project Phase is a template which holds many projects, if I change the direction in the end it won’t give me the same amount of project phases and the results will be different (because I will upload only the phases that actually have any projects). Let me know your thoughts.

Thanks

Userlevel 1

Hi Niko,

Could you elaborate more on the reporting you'd like to achieve, describe relationships between datasets and its field and maybe provide some specific sample data?

-Jan

Hi Niko,

Could you elaborate more on the reporting you'd like to achieve, describe relationships between datasets and its field and maybe provide some specific sample data?

-Jan

Hey Jan, thanks for response. What I want to achieve is e.g showing # of projects that entered given phase at specific period. The Phase Event would have it’s type f.e “entered” and “left”. From the database perspective in our app it looks similar to this: 


Generally the phase events are historical data for given project and project phase.

The data could eventually look something like this

Event ID Project ID Project Phase ID Event Type Event Date
1 1 1 enter 01.01.2020
2 1 1 leave 01.01.2020
3 1 1 enter 01.01.2020
4 3 1 enter

01.01.2020

5 4 2 enter 02.01.2020


Expected results:

# of projects entering phase Phase name Month
2 Phase A 01.01.2020
1 Phase B 02.01.2020

 

Hope that clarifies the problem a bit, thanks!

Userlevel 2

Hi Niko, thanks for the detailed description.

From your database diagram, do I understand it correctly that the project_phase actually has two different relations:

  • current phase in which the project is right now (represented by the project_phase_id in project table)
  • historical phase which a particular project entered or left at a particular moment (represented by the project_phase_id in the project_phase_events table)

 

If you would model it the way you did initially (with the orange triangle), if you would let’s say want to count number of projects entering “phase A” in January, the system would not know how to filter by the “Phase”. It could mean two completely different things:

  • Projects which are currently in phase A (join project_phases to projects and filter there)
  • Projects that entered phase A historically in January (join project_phases to project_phase_events and filter there) no matter in which phase they are currently.

In GoodData in this kind of relationships, each relation (direct or indirect) between two datasets should always represent the same thing. Either current phase or historical phase. If you need to represent something different, you would need to represent it with a different object - attribute or dataset.

So in this case, your model could look like this:

  • remove the direct connection between projects and project phases (leaving the Project phase meaning as “historical”)
  • if you need to report on the current project phase as well, add a new dataset “Current Project Phase” and connect it to the Project

 

or alternatively, if all you really need is the current phase name,

  • you do not need to include the “Current Project Phases” dataset at all
  • and simply just include the current phase name directly to the Project dataset as a new attribute

 

Both models are equally “good” and will serve the same reporting needs. 

 

 

BTW I would recommend this series about Logical Data Model in GoodData, specifically this part about the basic rules - one of the rules is exactly about these “alternate paths” and how to get rid of them.

 

I hope this helps.

Hi @michal ,

This explanation is really great, that clears up a few things. 

I will try this in upcoming days and let you know, but looks really promising!

Thanks to everyone involved

 

Reply