Hello, I'm working on a survey system and I have s...
# gooddata-cloud
s
Hello, I'm working on a survey system and I have some troubles writing a metric that should be simple. Some answers to my surveys are categorical (string values) while others are product IDs. The survey data are pivoted before going into GoodData. (We pivot instead of having a questions and answers table because we like to "view by" and "stack by" the surveys by other specific questions, which isn't possible otherwise) This is what I have in GoodData: survey_pivot • surveyId • date • what_is_your_age • what_is_your_gender • what_is_your_nationality • did_the_person_buy • product_bought products • productId (string) • name (string) • collection (string) • price (numeric) And I have set the relationship 1:N between "products" and "product_bought" through the productId field. This is working well if i want to, for example, view the surveys stacked by the collection of product bought. Now my issue: I want to show the sum of the price of all product bought. Sounds easy but I can't get it to work. Note that in my products table I have a lot of products that are not referenced yet by the survey_pivot and in my survey pivot I have a lot of null values in the product_bought column (where people have not bought -> did_the_person_buy set to "No"). I tried to directly use the products.price fact (the implicit metric is the sum), but then it sums the price of all products and i can't filter it through the interface. I also tried to write the following metric:
Copy code
SELECT SUM({fact/products.price})
  BY {dataset/surveys_pivot}
  WHERE {label/customer_bought} IN (
    "yes")
But then again it sums me all the product prices, not only the ones that have been bought. This is trivial to do with an SQL statement (join the pivot with the products table, then sum) but seems a bit complicated here. It looks like GoodData doesn't understand that there is a relationship between my products and pivot table in that case (but it is working fine with the view by/stacked by feature) or that the "join" doesn't behave as I think it should behave (maybe it is summing all prices because it includes empty values when trying to join?) Any help? I'm at a loss. Thanks in advance
1
m
Hi Simon, just to make sure I understand your model correctly: • the product_bought is a reference to the products table? • the table of products contains each product only once, no matter if it was bought and how many times? • but I assume you want the total revenue (number of sold products times the price of the product)? Not just sum of unit prices of products that were sold at least once? • The product price can not change in time. ◦ Since the price is in the product table and change there would change also the historical sales revenue. Are these assumptions correct? If yes, then let me try to answer how. you can achieve what you want and why it works this way in GoodData: First, in GoodData the system assumes, that every reference is done through a value which exists in the other table. So in this case if
product_bought
contains NULL/empty value, you should also have a record in your
Products
table with NULL/empty value in
productId
- representing those surveys where no product was bought. Without this, you might be getting inconsistent results since GoodData is performing inner joins internally. Second - the filters in GoodData are applied in the direction of the arrow in the LDM (that is in your case from Product to Survey) but not the other way. And also GoodData tries not to allow multiple-counting by exploding the tables unless explicitly instructed to in a metric. So if you do
SELECT SUM({fact/products.price})
it will perform the SUM only within the products table (not exploded to surveys) and it will not get filtered by Surveys. So here are some options how to achieve what you want: • Probably the simplest way how to do it would be something like
SELECT SUM( ({fact/products.price} * COUNT({survey_pivot.surveyId}) ) WHERE {label/customer_bought} IN ("yes")
◦ Here it counts the number of surveys for each product (and this part is for sure filtered) and multiplies it by the product price and then it adds this for all the products. • Another option could be this:
SELECT SUM( (SELECT {fact/products.price} BY {survey_pivot.surveyId}) ) WHERE {label/customer_bought} IN ("yes")
◦ here in the inner metric (with BY) you perform the “explicit lifting” of the price fact to the surveys table - technically this is the way to tell GoodData to do the “expanding” join that multiplies one price value to multiple rows in surveys. Where it becomes filterable by everything in the surveys table. And then you just SUM these values. I hope this helps.
👍 1
👍🏼 1
s
Thanks a lot really @Michal Hauzírek for taking the time to help me. I know my message was pretty long :D Your explanations make sense and I was able to get it working! FTR, I needed to use a variation of your second metric:
Copy code
SELECT SUM( (SELECT {fact/products.price} BY {dataset/survey_pivot}))
Notice the part inside the BY that points to the whole survey dataset and not the just surveyId column. With that change, it is behaving as expected. I explored the idea of adding a "null" product with a 0 price like you suggested but, as far as i am with my tests, its presence or not do not impact the computation of the metric. Still i will keep that in mind if an issue arise. I'm very happy, thanks a lot for your prompt response and help! Have a nice day!
1