Simon Lejeune
12/07/2023, 11:02 AMfeedbacks
table where one line is a survey and one column is a question.
The metric we are using at the moment is the following:
SELECT SUM(SELECT {fact/products.price} BY {dataset/feedbacks_pivot_en})
(Note that in our LDM, we have a relation between the products
and the feedbacks
datasets thanks to the Looking_for_which_product_
column in feedbacks
.
This is working well until we have question with multiple answers. To handle that, we duplicate the feedback row and only change the value of the question with multiple answers with the different answers. This was recommended here as the first solution.
When there is a duplicated row, the product value is counted twice in our current metric.
If I translate the current metric in SQL, this gives:
SELECT SUM(p.price)
FROM products p
JOIN feedbacks f ON p.productId = f.Looking_for_which_product_
If we don't want to duplicate the values, we'd like the join to be only on lines distinct by the feedbackId
and Looking_for_which_product_
columns.
This would give the following SQL:
SELECT SUM(p.price)
FROM products p
JOIN (
SELECT DISTINCT feedbackId, Looking_for_which_product_
FROM feedbacks
) f ON p.productId = f.Looking_for_which_product_;
Any help to adapt our metric to only work on the lines of feedbacks distinct by some columns?
Note that having a distinct only on the column feedbackId would already be a step forward and work until we can select multiple product in that question.
Thanks in advance,
SimonJoseph Heun
12/07/2023, 11:47 AMSimon Lejeune
12/07/2023, 2:41 PMSELECT SUM(SELECT {fact/products.price} BY {dataset/feedbacks_pivot_en})
But i want to run the sum only on the records from the feedbacks table having a distinct feedbackId,Looking_for_which_product_.
You can check the two sql queries of my first message to get the idea, if it's not clear tell me, I will try to clarify more. Thanks againMichal Hauzírek
12/07/2023, 11:31 PMSELECT SUM({fact/products.price})
Just to explain - the metric with the BY you have now - specifically the inner SELECT {fact/products.price} BY {dataset/feedbacks_pivot_en}
part actually instructs GoodData to “push” the fact from products table to feedbacks table. This might be needed if for some reason you want to filter or slice rows in products table by columns in feedbacks table.
Unless instructed like that, GoodData automatically works with the granularity of the facts and even if there is a join that would in standard SQL lead to multiplication of the rows, GoodData by default preserves the original fact dimensionality.
This however might prevent you from some operations that would require this multiplication (like filtering or slicing by anything from the feedbacks table). If you don’t need that, it is great and this simple metric SELECT SUM({fact/products.price})
should work out of the box.
But if need for this filter/slice is the reason why you have the BY in your metric, you will need to “normalize” the multiplied rows. One option is something like:
SELECT SUM( SELECT MAX( (SELECT {fact/products.price} BY {dataset/feedbacks_pivot_en}) ) BY {label/products.productId})
Here you keep the “expanded join”. but before performing the SUM on it, the values are first MAXed by product ID. So form all the multiplied product prices for each feedback, it becomes again a single value per product.
Note that this metric (as well as your original one) will ignore any products with no feedback rows).Simon Lejeune
12/08/2023, 2:11 PMsurveyId | surveyDate | question1 | question2 | question3
surveyId1 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1
surveyid2 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1
But, some of our questions accept multiple answers. In order to handle that into GoodData, we duplicate the rows and change only the answer column. This would look as such:
surveyId | surveyDate | question1 | question2 | question3 | question4 (multiple answers)
surveyId1 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null
surveyid2 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null
surveyid3 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | q4answer1
surveyid3 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | q4answer2
Note that i have now two rows for surveyId3
with the only difference in the value of for column question4
.
This allows us to write metrics to properly analyse those "multi select" values. All of this is fine and serving us well.
Now the last business requirements is the one related to products. Some of our question's answers are product ids. This would look as such:
surveyId | surveyDate | question1 | question2 | question3 | question4 (multiple answers) | productQuestion
surveyId1 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null | productId1
surveyid2 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null | productId2
What we want to do with that is summing the the price of product referenced by surveys. This is where we use the initial metric I posted:
SELECT SUM(SELECT {fact/products.price} BY {dataset/surveys})
Thanks to the link in the LDM between the products and surveys datasets, all of this is working perfectly fine.
The issue we're facing now is really when we mix the "multi-select" question and the product question. If i complete the "multi-select" examples with the product question, this would look as such:
surveyId | surveyDate | question1 | question2 | question3 | question4 (multiple answers) | productQuestion
surveyId1 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null | productId1
surveyid2 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null | productId2
surveyid3 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | q4answer1 | productId1
surveyid3 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | q4answer2 | productId1
Note that productId1 is appearing in three survey rows now (1st, 3rd and 4th) but it should only be counted twice by our value metric, since it is actually representing two surveys: surveyId1 and surveyId3.
I hope it clarifies the changes I would like to apply in the metric that sum the prices: it should not sum the price of all products referenced by surveys rows, but sum the price of all products references by unique survey rows. I can do that easily in SQL by using distinct
but it seems MAQL is missing distinct or unique functions?Michal Hauzírek
12/08/2023, 8:49 PMSELECT SUM( ( SELECT MAX( (SELECT {fact/products.price} BY {dataset/surveys}) ) *BY {label/feedbacks.surveyID}, {label/feedbacks.questionID}* ) )
Or this for just once per each survey:
SELECT SUM( ( SELECT MAX( (SELECT {fact/products.price} BY {dataset/surveys}) ) *BY {label/feedbacks.surveyID}* ) )
Does this help?
Each of these should deduplicate the multiplicated values to either “once per survey” or “once per survey+question combination” depending on your needs.
The joins in MAQL are only simple equi-joins and you unfortunately can not modify the join condition at the moment. And result of any metric is a number so the only distinct is baked into COUNT() function which is equivalent to SQL’s COUNT DISTINCT (for a single attribute). So other tricks with distinct of multiple columns currently need to be done by some MAX+aggregation magic like this one I am afraid.Simon Lejeune
12/11/2023, 10:15 AM