Hey. I'm looking to create a cohort analysis table...
# gooddata-cloud
f
Hey. I'm looking to create a cohort analysis table. I found this article, but it doesn't mention how to do a RUNSUM. Basically I need to do a running count (current idea is to do
RUNSUM(COUNT())
) based on 2 dates. When I do this runsum MAQL statement, it breaks because RUNSUM can only accept 1 date.
j
Hi Filip, The runsum function indeed only works with a singular date dimension. Could you provide some more details on your use case? How vital is it for you to use two date dimensions within the metric? Could you filter through a dashboard filter?
f
Ideally I want something like this - we need to track 2 dates because 1 date is the received date and the 2nd date is the updated date. The % values in the pic show a cumulative sum of updated claims / total claims submitted that day (based on submission date)
I am open to something else - I just don't have any other ideas. We are trying to track how long it takes for medical claims to get updated once they're submitted.
j
Interesting use case. I tried to solve it in my Github analytics project - calculate it on top of "commits" using created/updated date:
Copy code
SELECT (
  SELECT COUNT({label/commit_id}) BY {label/created_at.day} ALL OTHER
) / (
  SELECT COUNT({label/commit_id}) BY {label/updated_at.day} ALL OTHER
)
If I slice it by created date, it "works", see screenshot. 1. It does not use RUNSUM, but it should be possible to extend it accordingly 2. It slices by created date. Not 100% sure if it is what you need, but generally I don't see other way how to calculate something meaningful
OK, I just realized that it does not do anything meaningful. 1. created : updated is the opposite of what you need - you need count of updated : count of created 2. When I switch it, result numbers (still slicing by created date) are nonsense (thousands of percents). Slicing by update date fails with error. @Jakub Sterba have we considered this use case? I think it's valid use case, my example proofs it as well.
In the insight, I replaced "edit" with "debug" in URL and downloaded ZIP containing SQL file - what we execute against DB. My case results in SQL which calculates COUNT of all divided by COUNT by CREATED_DATE. Explains why result contains huge percentage values. My guess: the reason is that the context in the insight (slice by, "ROWS") is created date, so updated_date is removed from the metric.
When I add updated date as second date to the insight, if fails with error. Browser console contains details:
Multiple contexts for count, attribute='attribute/commit_id/cicd_demo_production'
j
The RUNSUM has currently limitation that it works only over a single date dimension. You may consider workarounds such as use of regular text attribute for definition of cohort or use multiple metrics broken by one dimension where each metric would apply different filter using the other date dimensions. You can put up to 20 metrics to insight as columns and use date for rows or transpose the table (20 rows as metrics and columns for breakdown by date)
For multiple context issue there may be ambiguity how to apply count. Two parameter count may help to specify it unambiguously.
f
I don't mind creating many metrics @Jakub Sterba - but updated date should be greater than submitted date in increments of 1. Any idea how to achieve this? I dont think you can do updated_date = submitted_date + 1 can you?
j
I am not sure if I understand. You may want to create special dimension in the data model for this like "Days since submission" and then build the cohort analysis using it, but such attribute cannot be used for cummulative counts. We are considering to offer cumulative count at insight level on top of MAQL metrics which may probably help here in future.
f
Ok, it seems doing days since submissions helped. Now I am having trouble doing a runsum. I tried using BY, WITH, FOR EACH, WITHIN - but everything gives me the 2nd option in pic attached. This is my latest trial using for each
Copy code
SELECT RUNSUM(SELECT COUNT({label/gooddata_main_data.claim_id})
where {label/open_claim_status_map.open_claim_status} = "Missing or Incorrect Info" 
OR {label/open_claim_status_map.open_claim_status} = "No Claim on File" 
OR {label/open_claim_status_map.open_claim_status} = "Processing")
FOR EACH {label/open_claim_status_days_since_submission}
j
This is because RUNSUM works only over dates. You can create the days since submission as date attribute (e.g. values like 2024-01-01, 2024-01-02, 2024-01-03 which would be displayed as Day of Year = 1,2,3) but you would have to use on rows text attribute (table column would have to be mapped to attribute in LDM in addition to mapping to date dataset) and this attribute would have to be used for rows of table. I admit this is not nice workaround. Alternative workaround for 20 columns can be to define separate metric for each column. E.g. column1:
select {metric/count_metric} where {label/days_since_submission} in ("1")
column2:
select {metric/m} where {label/days_since_submission} in ("1","2")
column3:
select {metric/m} where {label/days_since_submission} in ("1","2","3")
...
The titles of columns for metric or attribute title in rows can be renamed on level of insight so you can display just numbers in table header on top (1,2,3…) instead of names of metrics.
🙌 1