Hello! We are collecting survey data, some of the ...
# gooddata-cloud
s
Hello! We are collecting survey data, some of the questions are linked to a product and we use a metric allowing us to see the sum of products referenced by surveys. Surveys are collected in a
feedbacks
table where one line is a survey and one column is a question. The metric we are using at the moment is the following:
SELECT SUM(SELECT {fact/products.price} BY {dataset/feedbacks_pivot_en})
(Note that in our LDM, we have a relation between the
products
and the
feedbacks
datasets thanks to the
Looking_for_which_product_
column in
feedbacks
. This is working well until we have question with multiple answers. To handle that, we duplicate the feedback row and only change the value of the question with multiple answers with the different answers. This was recommended here as the first solution. When there is a duplicated row, the product value is counted twice in our current metric. If I translate the current metric in SQL, this gives:
Copy code
SELECT SUM(p.price)
FROM products p
JOIN feedbacks f ON p.productId = f.Looking_for_which_product_
If we don't want to duplicate the values, we'd like the join to be only on lines distinct by the
feedbackId
and
Looking_for_which_product_
columns. This would give the following SQL:
Copy code
SELECT SUM(p.price)
FROM products p
JOIN (
    SELECT DISTINCT feedbackId, Looking_for_which_product_
    FROM feedbacks
) f ON p.productId = f.Looking_for_which_product_;
Any help to adapt our metric to only work on the lines of feedbacks distinct by some columns? Note that having a distinct only on the column feedbackId would already be a step forward and work until we can select multiple product in that question. Thanks in advance, Simon
1
j
Hi Simon, We would really need to see how the LDM has been constructed. Would you be able to share a screenshot of your LDM for viewing purposes? Basically, the attribute on which the join happens should always be the primary key, so you would either need to change the LDM, or do it somehow with SQL datasets
s
Hi Joseph, thanks for your answer. The attribute on which the join happen is indeed the primary key (productId on the table products) and it is working fine. The issue is that i don't want to run the SUM on all the records from the feedbacks table, like the current metric is doing:
SELECT SUM(SELECT {fact/products.price} BY {dataset/feedbacks_pivot_en})
But i want to run the sum only on the records from the feedbacks table having a distinct feedbackId,Looking_for_which_product_. You can check the two sql queries of my first message to get the idea, if it's not clear tell me, I will try to clarify more. Thanks again
m
Hi Simon, If I understand it correctly, you want to SUM the price from Products no matter how many feedback rows you have for it? If yes, then it should be really simple, just modify your metric to:
SELECT SUM({fact/products.price})
Just to explain - the metric with the BY you have now - specifically the inner
SELECT {fact/products.price} BY {dataset/feedbacks_pivot_en}
part actually instructs GoodData to “push” the fact from products table to feedbacks table. This might be needed if for some reason you want to filter or slice rows in products table by columns in feedbacks table. Unless instructed like that, GoodData automatically works with the granularity of the facts and even if there is a join that would in standard SQL lead to multiplication of the rows, GoodData by default preserves the original fact dimensionality. This however might prevent you from some operations that would require this multiplication (like filtering or slicing by anything from the feedbacks table). If you don’t need that, it is great and this simple metric
SELECT SUM({fact/products.price})
should work out of the box. But if need for this filter/slice is the reason why you have the BY in your metric, you will need to “normalize” the multiplied rows. One option is something like:
SELECT SUM( SELECT MAX( (SELECT {fact/products.price} BY {dataset/feedbacks_pivot_en}) ) BY {label/products.productId})
Here you keep the “expanded join”. but before performing the SUM on it, the values are first MAXed by product ID. So form all the multiplied product prices for each feedback, it becomes again a single value per product. Note that this metric (as well as your original one) will ignore any products with no feedback rows).
s
Hi Michal, thanks for having a look. I think you also helped me on the first iteration of this metric, some times ago. It is very appreciated 👍 I don't think your suggestion matches my needs, but maybe they were unclear. Let me try to rephrase (please bear with me🙏) We have a survey system, basically question and answers. When we put our collected surveys in GoodData, we pivot them and it would look as such:
Copy code
surveyId  | surveyDate | question1 | question2 | question3
surveyId1 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1
surveyid2 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1
But, some of our questions accept multiple answers. In order to handle that into GoodData, we duplicate the rows and change only the answer column. This would look as such:
Copy code
surveyId  | surveyDate | question1 | question2 | question3 | question4 (multiple answers)
surveyId1 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null
surveyid2 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null
surveyid3 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | q4answer1
surveyid3 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | q4answer2
Note that i have now two rows for
surveyId3
with the only difference in the value of for column
question4
. This allows us to write metrics to properly analyse those "multi select" values. All of this is fine and serving us well. Now the last business requirements is the one related to products. Some of our question's answers are product ids. This would look as such:
Copy code
surveyId  | surveyDate | question1 | question2 | question3 | question4 (multiple answers) | productQuestion
surveyId1 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null                         | productId1
surveyid2 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null                         | productId2
What we want to do with that is summing the the price of product referenced by surveys. This is where we use the initial metric I posted:
SELECT SUM(SELECT {fact/products.price} BY {dataset/surveys})
Thanks to the link in the LDM between the products and surveys datasets, all of this is working perfectly fine. The issue we're facing now is really when we mix the "multi-select" question and the product question. If i complete the "multi-select" examples with the product question, this would look as such:
Copy code
surveyId  | surveyDate | question1 | question2 | question3 | question4 (multiple answers) | productQuestion
surveyId1 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null                         | productId1
surveyid2 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | null                         | productId2
surveyid3 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | q4answer1                    | productId1
surveyid3 | 2023-12-08 | q1answer1 | q2answer1 | q3answer1 | q4answer2                    | productId1
Note that productId1 is appearing in three survey rows now (1st, 3rd and 4th) but it should only be counted twice by our value metric, since it is actually representing two surveys: surveyId1 and surveyId3. I hope it clarifies the changes I would like to apply in the metric that sum the prices: it should not sum the price of all products referenced by surveys rows, but sum the price of all products references by unique survey rows. I can do that easily in SQL by using
distinct
but it seems MAQL is missing distinct or unique functions?
m
Thank you for the detailed explanation, Simon. I am still not sure if I fully follow (i.e. how your source data look like), but this for sure helps me understand your need better. So do I understand correctly that you want to count the product price for each survey+question combination? Or just once for each survey? In that case I think might work (for survey+question):
SELECT SUM( ( SELECT MAX( (SELECT {fact/products.price} BY {dataset/surveys}) ) *BY {label/feedbacks.surveyID}, {label/feedbacks.questionID}* ) )
Or this for just once per each survey:
SELECT SUM( ( SELECT MAX( (SELECT {fact/products.price} BY {dataset/surveys}) ) *BY {label/feedbacks.surveyID}* ) )
Does this help? Each of these should deduplicate the multiplicated values to either “once per survey” or “once per survey+question combination” depending on your needs. The joins in MAQL are only simple equi-joins and you unfortunately can not modify the join condition at the moment. And result of any metric is a number so the only distinct is baked into COUNT() function which is equivalent to SQL’s COUNT DISTINCT (for a single attribute). So other tricks with distinct of multiple columns currently need to be done by some MAX+aggregation magic like this one I am afraid.
s
Hi @Michal Hauzírek, Thank you a lot, I just tested it and the "once per survey" version is working :) I'm starting to grasp how to write those advanced GoodData metrics thanks to your detailed explanations. Have a nice week!
👍 1