I'm having trouble getting MAQL right in a formula...
# gooddata-platform
I'm having trouble getting MAQL right in a formula for survey data. The data is structured so that there is one row per response per question (i.e. if 100 people responded to a question, there would be 100 rows in the table). In this table one column would have the question, which would be the same value for all 100 rows in my hypothetical example. The other column would have their responses. It's fairly easy to calculate the total people who gave a specific answer (just a count of rows), but I'm having trouble showing the total number of people who were asked the question. I want a field that shows % of total respondents. I've been using this guide and have tried every combination of "BY ALL IN ALL" or "BY ATTRIBUTES" or "WITHOUT PARENT FILTER", but none of them seem to get it right. Either I get the total number people who responded to any question (not everyone is asked every question) or I get some other weird subset that doesn't make sense. What's the best way to think about this? It also needs to be dynamic so that if I filter the down (i.e. only show me respondents in the USA), the total number of respondents adjusts to the new base.
For further context, here is how I can achieve it when writing regular SQL. I'm trying to dynamically populate "TotalRespondents" in a custom field. Pre-calculating it in my raw table is too rigid. SELECT QUESTION, SUM(WEIGHT) as TotalRespondents FROM ALLRESPONSES GROUP BY 1
I've tried SELECT SUM(WEIGHT) BY QUESTION, but that doesn't give me the expected result. I then tried to get it to apply and ignore certain filters, but that also didn't work correctly.
Hi Mitchel, I am not sure I fully understand what exactly you are trying to achieve, and also I am not sure what is the “weight” in your SQL examples but let me try to help: If in your table/dataset you have all the responses - one record being response to one question by one person, then you will get total number of people who responded to a particular question like this:
SELECT COUNT(*person*) BY ALL *response*
the “BY ALL response” here means “do not split the result by response even if the report is split by it and for each value display the total number of responses”. If you also want this number to show the total even if you filter out a specific response (i.e. if you filter for “Yes” and want to include also “No” and “Maybe” in the total number) you would use this metric:
SELECT COUNT(*person*) BY ALL *response* WITH PF EXCEPT *response*
the WITH PF EXCEPT means - apply all the filters but ignore filter for response. So to calculate % of total responses, you can do it like this:
SELECT COUNT(*person*)/(SELECT COUNT(*person*) BY ALL *response* WITH PF EXCEPT *response*)
Yeah, I gave a hypothetical example where each row is worth the same amount. However, in my survey data, respondents are weighted. Some are worth slightly more than 1 and some slightly less. Instead of counting rows, I have to sum their weight. Unfortunately the formula doesn't seem to work. I have calculated the correct value and pushed it in. Here is a screenshot comparing the correct value vs. the formula you suggested.
@Michal Hauzírek
I added a few other exceptions to see if it would help but no luck
And what exactly are you trying to achieve? Can you maybe provide some simple example with a few lines of data?
Let's use your screenshot above as a good example. Let's say I want to filter that screenshot to only the "Yes" values for each question, but I want the ratio numbers to stay the same.
The formula as I have it written keeps showing the equivalent of # of responses. I can't get it to show BY ALL RESPONSE whenever I apply filters
Did you try adding the "BY ALL response WITH PF EXCEPT response" to your metric? In your screenshot you have "BY short question".
SELECT SUM(weight) BY ALL *response* WITH PF EXCEPT *response*
It is by short question because that is the value I'm pulling into the chart. I tried by question and by response as well.
and do you have it BY ALL short question? The "ALL" is inportant there.
Yeah, that expands it too far.
Each short question is an independent question. I need to show everyone who responded 8-10 out of all those asked
Adding the ALL shows all respondents on the entire survey, not just those who took the individual question
So in the first row, 49.83 people answered the question (because of weighting you get an odd number)
Of those 49.83 people, 37.84 responded "yes" (which is what the previous iteration of the formula without the ALL was calculating)
514 answered ALL the questions but not the specific question in that row
How about this? I essentially need a way to calculate % of respondents using a single field for all the tables in this example (both the summary data tables and the desired chart). This file has formulas, so you can see how it is being calculated.
Thanks, that is really helpful. In this particular case, this metric will give you what you need (the % of respondents who answered the answer which you will select from the filter):
SELECT SUM(*Weight*) / (SELECT SUM(*Weight*) BY *Question*,*Country* WITH PF EXCEPT *Answer*)
In your sample data the Question and Country are aligned, I am not sure if it would be the case also in more general case, but for this example this metric seems to do exactly what you need and is dynamic based on what you filter.
Hi @Mitchel Roling, it would be good to understand your use case and model better. It makes difference for BY rules for example if the attributes Question and Response are in hierarchy in the LDM or if they don’t. Where is the Weight fact in the model? Is it fact of the response or is it a fact of a “Person” dimension dataset? The result shall display responses of multiple questions and sum of results to the same question should be 100% if none filter for response is specified, right? Or is it possible that the subtotal per question will be smaller than 100% because some people did not responded (e.g. only 50% responded and they responded ‘yes’ then there will be only one row for that question with yes and result 50%)?
If data is organized in a single dataset with attributes Question, Country, Answer, Weight where none of them is primary key you can use MAQL:
Copy code
And for example view such metric by Question as rows and Answer as columns. Or you can use filter Question in (‘Do you live in US’, ‘Do you live in France’) and filter Answer=‘Yes’ and view it by ‘Country’. In case you select by mistake multiple questions with the same country, it will show the highest percentage of a question for given country.
Thanks @Michal Hauzírek! Unfortunately the formula still isn't quite working for me. However, @Jakub Sobotka's solution does seem work! @Jakub Sobotka - You are correct that the weight is a person attribute, not a question attribute. % of respondents should never be higher than 100%. I put the person's weight in the responses table because I thought it would be simpler. However, it is also in the respondents table in my data model. In reality the data is structured as two tables in my DM: responses & respondents.... with the respondent's weight as a column in both. If I were to summarize the weight from the respondents table vs. the responses table, how would that affect the BY part of the formula? What if I added a separate questions table? Attached is an updated example of the current data structure (see two tables tab). Also, the data structure can easily change.
I am glad that it is working!
SELECT SUM(SELECT Weight BY Records of Responses)
lets you lift fact from the dataset Respondents to dataset Responses supposing you have connected them in the model (arrow points from Respondents to Responses where RespondentId is used as foreign key to Respondent dataset and there is none primary key set in the dataset Responses). If you have a primary key defined in Responses dataset such as Response Id you have to use it instead of Records of Responses which is automatically created attribute only in case primary key does not exist (
SELECT SUM(SELECT Weight BY Response Id)
), Such statement will not sum weight of each respondent but it will repeat weight in each response of that person and then sum up all the responses. The total may be higher than sum of weights of all resondents. You can use such metric instead of SUM(Weight) in the metric with BY rules which calculates the percentage.
Very helpful @Jakub Sterba! Thank you! As of now, responses has no primary key, only the respondents table has a primary key. But it is helpful to know both scenarios. Appreciate the help!