Hi, I am struggling to create a MAQL query. We hav...
# gd-beginners
Hi, I am struggling to create a MAQL query. We have data regarding quiz attempts submitted by users ({label/quiz_attempts.id}) and whether those attempts have been marked as passed or failed ({label/quiz_attempts.passed}, values true or false). I need to create a metric to return the average number of attempts needed to pass a quiz. Do you have any suggestion on how I could achieve this?
Hello Raffaella, you can find information on building a metric with AVG function here: https://www.gooddata.com/docs/cloud/create-metrics/maql/aggregation/avg/ You would essentially need to count the attempts the users have taken with a passing grade, and then divide by total attempts.
If you have built the report, and are still facing issues, could you provide screenshots and details as to what you are seeing vs. what you expect to see?
Hi @Joseph Heun, thanks, I am trying
Hi Raffaella, depending on your data (and the business logic behind), I can see several ways how you can achieve this. If there already is an attempt order in your data (i.e. this record was this user’s fifth attempt on this quiz), and if we do not assume users take any more tests after they pass, I would make this “attempt order” a fact in GoodData model and then used a metric like:
Copy code
SELECT AVG({fact/quiz_attempts.attempt_number}) WHERE {label/quiz_attempts.passed} = true
This should give you average attempt # across all passed tests. If you don’t have the attempt number in your data, you might try to construct it within MAQL (assuming you have some reliable sequential ID or a date of the attempt). Basically we want for each user and each quiz (if there are more quizzes) mark each of their attempts to pass this quiz. Assuming we have a reliable sequential ID as a fact in the data
(bigger value of ID means the test was taken later). Then something like this can work:
Copy code
    SELECT ROW_NUMBER( {fact/idfact} ) WITHIN {label/quiz_attempts.user_name}, {label/quiz_attempts.quiz}, ALL OTHER
  ) HAVING {label/quiz_attempts.passed} = "true"
the very inner metric uses the idfact to calculate the order of attempt within quiz and user. The metric above filters it to just the passed attempts with HAVING instead of WHERE to prevent passing the filter also to the inner metric. And the level above is just an AVG of this. If instead of sequential ID you have a date, you can do a similar thing with it. Just I believe the ranking function does not work with dates currently, so you might need to use DATE_DIFF and deduct your date from some fixed date in the past to get some numerical value to work with. If users can take quizzes after they pass one, there would need to be some modifications to the metric. (i.e. do MIN(attempt_number) BY user,quiz before doing the AVG).
Hi @Michal Hauzírek and @Joseph Heun Thanks for the pointers. The main use case is that the user would submit multiple attempts until they passed the quiz However, we have quiz_attempts.id (sequential) as an attribute (not a fact), as we need to be able to include that attempt ID in pivot tables. I believe the above query fails because of the attribute
You can actually have the same physical DB field mapped to both a fact and an attribute in the model. I am not sure if attribute can be used in the ranking functions.
👀 1
Hi Michael, interesting, I didn't see this option in the documentation - can you share any instruction on how to do this?
You can go to the modeller edit mode and click “Add field” on a particular dataset. Then select what physical column of your table you want to use and then define whether it should be added as an attribute or fact and name the field in GoodData. You get a warning on the dataset level (yellow exclamation mark) notifying you that you’ve mapped the same column multiple times, but you can safely ignore it and publish your changes.
👀 2
I have followed the steps, unfortunately the metric I have set up returns this error: 5fdb7557e7ca6b6c84a5d4487763f35a
Copy code
  "title": "Bad Request",
  "status": 400,
  "detail": "A result cache error has occurred during the calculation of the result",
  "resultId": "18f646d021c32c670408d480a9a4098468a9eec0",
  "reason": "Window function for ROW_NUMBER with a RANGE from UNBOUNDED PRECEDING is not implemented in Redshift",
  "traceId": "5fdb7557e7ca6b6c84a5d4487763f35a"
The metric is SELECT AVG( SELECT ( SELECT ROW_NUMBER( {fact/quizattemptid} ) WITHIN {label/fullname}, {label/quiz_attempts.id}, ALL OTHER ) HAVING {label/quiz_attempts.passed} = "true" )
Hmm, I see. It might be some limitation of MAQL on top of redshift which I am not familiar with. The above example worked for me on Postgres. Just to confirm a few things: • do you have a primary key defined in the quiz_attempts dataset? • is the {label/quiz_attempts.id} which you used in the WITHIN this primary key of the whole attempts dataset? ◦ in my example it the
was meant as ID of the quiz, not an attempt (i.e. if there can be multiple different quizzes each user can take and pass). If you do not have multiple quizzes in your data and each user only takes one common quiz and has multiple attempts, you can just keep
WITHIN {label/fullname}, ALL OTHER
I am not sure if any of these potential changes will lead to different (and working) underlying query, but I wanted to check it just in case.
👀 1
BTW in the documentation, there is mentioned a limitation about Window functions on top of Redshift. It explicitly mentions running aggregation functions like RUNSUM. I am not sure if this actually also affects the ranking functions somehow. I will try to check this with the GoodData engineers. If there is such limitation (and you can not switch to different database), you might try to overcome it either by pre-calculating the row number directly in your data or using a SQL dataset.
Hi @Raffaella Gozzini, I spoke with GoodData engineers about this and they are interested in more information regarding this error to debug it. Could you please help us by exporting a debug package? Debug package is a zip file containing several files with metadata information regarding the insight, relevant structures of the data model, and possibly the generated SQL queries. It does not contain any data from your database, any credentials or any user information. Feel free to do it in a private message if you do not want to post it publicly. To obtain the debug package, please: 1. open the failing insight in Analyze (make sure its latest version is saved) 2. check the URL in your browser. it will be something like https://{HOSTNAME}/analyze/#/{WORKSPACE_ID}/{INSIHT_ID}/*edit* 3. replace the “*/edit*” at the end of the URL with “*/debug*” and confirm 4. this should start download of a file (explainAfm.zip) - that will be the debug package. Please share it with us (i.e. in a private slack message to me). Thank you.
Hi @Michal Hauzírek thanks so much for this, I will send you over the debug file today.
Just a quick update here - based on the debug file, our engineers found a way how to allow ROW_NUMBER and some other functions on top of redshift. As far as I know, this change should go out in one of the next releases, probably early March.
That's fantastic news! Thanks Michael!