Hi! I hope it’s ok to ask a MAQL question here. I ...
# gd-beginners
a
Hi! I hope it’s ok to ask a MAQL question here. I am still learning to write the queries and I’m stuck with something. I am attaching a snippet of my LDM covering the relevant part for what I am working on. The JIRA dataset is collapsed because it has so many facts and attributes, which would add tons of irrelevant details. It is relevant to know that JIRA dataset holds information about employees: • Username - attribute which is an identifier for each employee • Company they work for (reference to Companies dataset) • Day of employment (reference to Worked Date) • Availability - A fact that the person was employed on a day (represented as value of 1) Working Days dataset holds a workday/holiday pattern for each company dictated by local employment regulations (1 if the day is workday, 0 if it’s not). What I am trying to do is to build a metric which eventually can be aggregated by company or even the whole group of companies, showing the number of full time employees (FTE) in a selected time period. My plan is to achieve that by finding: 1. Number of workdays for each employee in a chosen time period (SUM of Workdays in JIRA dataset for that person in the selected time period) 2. Number of workdays in the company of that person in a chosen time period (SUM of Workdays in Working Days dataset for the company that is associated with the person in the JIRA dataset) And then dividing the first number by the second. For a person that was employed throughout the chosen time period, the result is 1 because the amount of workdays for the individual person matches the number of workdays for their company. However, if the person only worked part of the period, then their FTE number is less than 1. I am struggling with handling the calculation for persons who have joined in the middle of the period. Check the second screenshot, representing one person per line. 14 and 15 of August on the screenshot represent a weekend, correctly showing that it was not a working day for the employee, nor the company. If you look at the 5th line, then you start seeing the problem. The line shows a person whose employment started on 16 August and this is correctly shown by the Employee WD metric, but the problem is that Company WD is not even shown for rest of the chosen time period. Meaning that if I add up the numbers for the person over the time period, I still get Employee WD matching the Company WD and the resulting FTE result is 1. I am thinking that this behaviour happens because the connection point between the person and the company only exists for the days for which I have records in the JIRA dataset. Is there a way to build a metric which would find the number of Company WD for the full chosen time period, not only for the days the person worked? I guess one way to achieve that is to populate the JIRA dataset with 0 values for the days when the person wasn’t employed, but is there perhaps a more elegant solution?
I’ll post here screenshots of the metrics that I have build based on this LDM.
Employee WD (you can ignore the Unpaid Absence part, which is not really relevant for this discussion).
Company WD
FTE
By the way, this topic is essentially a follow-up to what was discussed here: https://community.gooddata.com/metrics-and-maql-57/struggling-with-maql-aggregation-and-filtering-224 After the Community post I eventually still decided going forward with importing the company workday/holiday patterns into GoodData. At first I tried to avoid doing that.
I would be very grateful for any help/ideas/insights. I have been stuck with this for some time now. 🙂 Thanks!
j
Hi @Allan Talver, Let me give you some hints which may help. There can be problem with “NULL arithmetics” in the expression:
Employee WD / Company WD
If there is no record for employee in Jira dataset the metric Employee for that date has probably NULL value. The result of division is: • 1/1=1 • 0/1=0 • null/1=null I suggest to wrap Employee WD by IFNULL function:
IFNULL(Employee WD,0) / Company WD
Query engine will interpret null value (missing data) as 0 and result of IFNULL(null,0)/1 should be 0. I suggest not to use Records of Jira in metric expression of Company WD. So that it returns value even for days for which none record exists. I suggest something like:
select sum(select ifnull(sum(Availability -Unpaid (Absence)/8),0)/sum(Workday) by Username, Date(Work Date))
a
Hi @Jakub Sterba, Thanks for the quick response. I will delve deeper into your suggestions in a moment, but I can comment on the “Records of Jira”. I used that as a link to match a Company for an employee (Username attribute in the Jira dataset). Basically I was trying to follow the logic explained in this article: https://help.gooddata.com/doc/enterprise/en/dashboards-and-insights/maql-analytical-query-language/maql-use-cases-and-tutorials/explicit-lifting
I believe that I can’t SUM workdays (in Working Days) dataset by Username (in Jira) dataset directly. Or can I?
j
I see, Username is just in Jira dataset. Let me think about it again 🙂
What if you do this?
select count(Username) where
(select
(select sum(Availability - Unpaid (Absence)/8 ))
/
(select max(select (select sum(Workday) by Company) by Records of Jira))
by Username)=1
The idea is to compute number of work days per company and repeat the result in each record of Jira dataset using the lifting.
a
Hmm, let me try that.
j
so in a context of a month, total number of workdays of a company can be 20 and this value will be used for each record in Jira dataset of all the company employees
employee days will be calculated as SUM and MAX function is used to pass the company days to Username level
I am not sure how complex is the use case and if you need to care of User being in relation with multiple companies. This would add additional complexity.
a
Yes, the User can exist in various companies over time (but not at the same time). Not sure if that will invalidate your approach? Unfortunately I still feel quite “green” with the MAQL things, so I have to first try it out to be able to evaluate what will happen. 😄
But I am currently creating that metric in GoodData, so I will test it anyway.
j
In that case it may be the way to count users by company and then sum it up
Copy code
select sum(select FTE_defined_above by Company)
User would be not counted as FTE if the user worked few days for one company and rest of the month for another even if he worked in all the days.
a
Thanks for your help with this @Jakub Sterba! I have to stop for today, but I definitely got some ideas that I can continue working with on Monday (if not sooner). But a general question, how can one build up their knowledge about more advanced uses of MAQL? I have gone through GoodData documentation as well as the University courses, but I still feel like something is missing and on my own I often get stuck, at some points more or less just trying to combine different statements together to see if anything will take me closer to what I need. I am sure that this is not the most efficient way to build up the experience. Is there another course or other resources that could help me in this? 🙂
Hi, @Jakub Sterba and others! I am still stuck with this one. Perhaps someone has some suggestions. When trying out the metric suggested in the thread, I ran into a problem. Namely, we have companies popping up regularly (often 3rd party contractors). When a new company with their first employees appears in the middle of a filtered period, the FTE calculation results in an incorrect number. The reason being that lifting with “Records of Jira” only takes into account the company workdays for which there was an employee in the company, not the full period. Say, for example, I select one month period for my Worked Date filter. That month has 20 working days according to the local regulations of the country where the company is located. Those 20 working days are reflected in the Working Days dataset. A new person joined in the middle of the month and only works for the last 10 days of the month. The result of FTE for that person still is 1 because they were available for 10 days (in Jira dataset) and it gets divided by the matching 10 days that are found from the Working Days dataset. I am guessing that the MAX function isn’t helping if the person happens to be the first one in that company. Is there a way how to have all the days covered by the filtered period taken into scope from the Working Days dataset, regardless if there was anyone working on some of these days or not?
j
Hi did you try this?
Copy code
select count(Username) where 
(select
  (select sum(Availability - Unpaid (Absence)/8 ))
  /
  (select max(select (select sum(Workday) by Company) by Records of Jira))
by Username)=1
the submetric:
Copy code
select (select sum(Workday) by Company) by Records of Jira))
shall lift number 20 from your example to all the records of the Jira dataset of the company regardless if a person worked 20 days or 10 days Metric
Copy code
select max(select (select sum(Workday) by Company) by Records of Jira))
should display value 20 for each Username from the company if you slice it by username
j
Just out of my curiosity - would not it be more efficient to extend the model by a dataset, which would contain number of working days (based on the regulations) per country? This dataset would be connected by country attribute with the rest of the model. You would not have to calculate number of working days from Jira issues dynamically. Btw. this dynamic calculation seems also to be a potential performance issue, imho.
a
Hi @Jakub Sterba I checked this once again and actually you were correct, it does seem to work. I had a problem with underlying data which led me to believe that the metric is showing an incorrect result for some employees. But actually the data was broken. However, I have two follow up problems/questions. Currently the metric looks like this:
I didn’t do the counting part because we can have fractional employees who should also be included in the aggregated number.
It appears to work like this. I have 3 persons selected in 1 one company and 1 in another (they have just been selected in the Username filter). Worked Date filter is set for This Year. The first employee started in mid august and 0.34 makes sense. 3rd one started in the beginning of March and 0.8 also makes sense. However, the 4th person worked in another Company until they moved to the new one (which is being show on the screenshot as 0.27). For some reason the previous employment for that person isn’t shown. In total, that person has worked for 5 years, including all of the ongoing year. And also, the person on the second line has worked for full year without interruptions. So I would expect to see 1.0, instead of 0.95. But I will have a look into the underlying data to see if I can make sense of this difference.
And the other question is about silicing. I would like to be able to slice that metric by smaller time periods, for instance quarters, months or even weeks. If I slice it, then I get the result shown in the screenshot. I would like the metric to result in 1.0 if a person has worked all days of the month. I am guessing that I would have to somehow use averages? Or is it because the divisor of my calculation is not sliced by month for some reason, and therefore the number of workdays in a month (for example 20) gets divided by number of workdays in the full year (for instance 250)?
@Jan Soubusta, actually the dataset “Working Days” is supposed to do almost that. In that dataset I have the number of working days for each Company. I am aware that for efficiency I could actually have another level in between and have that data per country (as we can have more than one company in a country). When I initially began working on that metric, I posted a Community post: https://community.gooddata.com/metrics-and-maql-57/struggling-with-maql-aggregation-and-filtering-224 Back then I was trying to do the calculation by calculating the number of Workdays per company by finding the max number of workdays that any single user had worked per company. But as you suggested, it’s not really the best way to do it dynamically and it also caused the metric to result in a distorted value if there were months where no one had worked all the days (for example a new company appearing in the middle of a selected time period).
Ok, out of the problems/questions that I posted above, I think I have now figured out two issues. The 0.95 showing for a person who worked for full year is due to an issue with underlying data. Not about the metric. And also, the issue with the person who switched company mid year is very likely due to problem with data as well (working days not imported for the company where the person worked before the switch). So it looks like the only thing missing at this point is how to make the metric sliceable by shorter time periods. 🙂
j
I think approach with lifting is not so flexible if you need to slice the metric by different date attributes, but I think I have found a better solution. I suggest to use following metrics: Sum of Availability:
Copy code
select sum(Availability)
Work days:
Copy code
select sum(Workday) by all Username
FTE:
Copy code
select Sum of Availability / Work days
The trick with “by all Username” will compute the sum(Workday) in context of insight (Company, Month/Year, Username) without the Username (Company,Month/Year) and result will be repeated for each value of Username. Without the “by all Username” the sum(Workday) cannot be broken down by Username. The value will be available in insight only for the months where exists relation between Username, Month, and Company. See example with metrics below. Company A has working days Mon-Fri and Company B Sun-Sat.
Screenshot 2021-10-02 at 02.56.27.png
You may need one more metric to find for example the highest FTE of user if you want to display the ratio without breakdown per company and some users work for multiple companies. FTE2
Copy code
select max(select FTE by Company)
Finally number of FTEs:
Copy code
select count(Username) where (select FTE2 by Username)=1
Screenshot 2021-10-02 at 09.33.02.png,Screenshot 2021-10-02 at 09.33.16.png
Or as one metric:
Copy code
select count(Username)
where (select max(select sum(Availability)/(select sum(Workday) by all Username) by Company) by Username)=1
You can slice the metric by any combination of attributes Company, Username, Year, Quarter/Year, Month/Year, Week/Year, Day of Week, or any other date attribute.
or even compare it to previous periods
Screenshot 2021-10-02 at 10.00.26.png,Screenshot 2021-10-02 at 10.01.21.png
a
Oh, wow! Thanks @Jakub Sterba! I’m recreating and testing it in our environment now. But I have one question about one pattern I see in your metrics. Could you explain why you decided to use the count function? Is the intention to only count the employees who have their FTE number equal to 1 (and ignore others, like part timers, etc.)? In my case I would have to include all part timers as well and the end result should essentially be a number of “full time equivalents”. For instance, a company has only two employees who only work on half of the days, giving 0.5 load for both of them, should be added, resulting in 1 FTE for that company. So I believe that in my case I should be using SUM, instead of COUNT.
In general, it looks good and slicing by company, username, date is ok. However, I have a follow-up issue here. Perhaps I should have mentioned beforehand that Jira dataset has some additional attributes that I would like to use for slicing or filtering (for instance employee status: “employed”, “external contractor”, etc., also the team membership of the employee on each day). At the moment, these attributes are not available if I use the new metric in Analytical Designer.
These attributes are there to describe the “Availability” fact. Working Days would still remain the same for the company.
j
try to list them in the metric definition:
Copy code
select sum(Workday) by all Username,Employee Status, ...
a
Hmm, then I get the “Sorry, we can’t display this insight” message
j
sorry, my bad
Copy code
select sum(Workday) by all Username,all Employee Status, ...
you have to repeat “all” 🙂
a
Oh, right. Let me try that 🙂
And there is no shorthand equivalent to something like “take them all”? 😄 BY ALL Records of Jira probably has some different meaning? Tried that, but then got weird results from the metric.
j
there is by all in all other dimensions
but you need to slice it by attributes like Company, Year, Month/Year …
in theory you can do by all in all other dimensions except Company, Year, Month/Year …
you may try by all Records of Jira
I am not sure about outcome
a
It does appear to work now when I listed the attributes one by one. I’ll do some checks to verify if the numbers make sense.
j
I think Records of Jira won’t work because it will ignore also slicing by Company and date attributes
It is good to hear it is working now
But I have one question about one pattern I see in your metrics. Could you explain why you decided to use the count function?
I probably misunderstood requirement in the initial question:
What I am trying to do is to build a metric which eventually can be aggregated by company or even the whole group of companies, showing the number of full time employees (FTE) in a selected time period
So I interpreted it that you need to count number of employees who work full time.
a
Aha, ok. Thanks! Probably I wasn’t clear enough in my explanation. The intention is to get a sum of the contribution of all employees (even if they are fractional)