Hello I have a dataset that is something like ` start Date e GoodData #gooddata-cloud

Hello, I have a dataset that is something like `{ ...

Michael DeWulf

11/09/2023, 10:49 PM

Hello, I have a dataset that is something like

{ start: Date, end: Date, ... }

. I'd like to be able to return records from this dataset where either

start

end

is in the selected interval of some Date dataset. Is this possible? If so, how?

Steve Fox

11/09/2023, 10:54 PM

I think you could use a metric,

Steve Fox

11/09/2023, 10:54 PM

but not sure.

Steve Fox

11/09/2023, 10:58 PM

There are date diff/offset formulas and depending on how you're searching for the date range, could use that to help include records: • recordid where <input> date <= end or input date >= start

Michael DeWulf

11/09/2023, 11:09 PM

But how do you get the input date? I only see the DATE macro for today and offsets from today.

Steve Fox

11/09/2023, 11:15 PM

Pretty sure the "THIS(DAY)" part is the piece that comes from the dates on the insights where you use it.

Steve Fox

11/09/2023, 11:16 PM

Haven't really tested much, but I'm working on an insight for "things due within 30 days of date" kind of scenario. At which point 30 days of THIS(DAY) would be the center

Steve Fox

11/09/2023, 11:16 PM

not sure if works for your scenario, but could work.

Michael DeWulf

11/09/2023, 11:17 PM

The docs are pretty clear that THIS is based on the current calendar date.

Michael DeWulf

11/09/2023, 11:17 PM

Not the date range interval that is selected.

Steve Fox

11/09/2023, 11:18 PM

Hmm, ok. I'm not sure what the interval is that you're pulling into the insight is coming from.

Steve Fox

11/09/2023, 11:18 PM

you have a date.

Steve Fox

11/09/2023, 11:18 PM

not a date range as far as I know. Maybe your setup/situation is different.

Michael DeWulf

11/09/2023, 11:19 PM

Users always filter by some date interval.

Michael DeWulf

11/09/2023, 11:20 PM

E.g. this month

Steve Fox

11/09/2023, 11:21 PM

fair point. Something to ask with GD folks because if you do use like "Last year" what is the date used...

Michael DeWulf

11/09/2023, 11:23 PM

They filter the context for you, but this is off topic. This is all documented in MAQL docs.

Steve Fox

11/09/2023, 11:24 PM

ok fair enough

Michael DeWulf

11/09/2023, 11:25 PM

Thank you though for your help, Steve!

✅ 1

Michael DeWulf

11/09/2023, 11:28 PM

Anyone from GoodData that can help?

Steve Fox

11/09/2023, 11:29 PM

(probably in the am as they're mostly based in Europe)

Jakub Sterba

11/10/2023, 12:37 AM

This problem probably does not have easy solution. You may try to add bridge dataset of many-to-many relationship and create two records for each row of original dataset (one containing start_date and the other end_date). For example SQL dataset with SQL query:

Copy code

select "id","start_date" as "date",'start' as role from interval_dataset
union all
select "id","end_date" as "date",'end' as role from interval_dataset

Filter for “Date” dataset will be then applied effectively as OR filter for start and end date

Jakub Sterba

11/10/2023, 1:14 AM

Other option is to use a materialized disconnected date dimension containing at least all dates between minimum start_date and maximum end_date. You can generate such series using generate_series in case of PostgreSQL (syntax may differ for other databases):

Copy code

select "disconnected_date"::date from generate_series('2023-01-01'::date,'2023-12-31'::date,'1day'::interval) as "disconnected_date"

This date dimension should have no connection to any dataset in the model and it will be used only to determine minimum and maximum date matching the filter selection. BY ALL OTHER keyword lets you use the result as “constant” in expressions of disconnected datasets. You can define filter using this disconnected date dimension in insight first and then apply following metric:

Copy code

select count( {dataset/interval_dataset})
where {label/start_date.day} 
  between 
  (select min( {label/disconnected_date.day},{dataset/materialized_date_dim}) by all other) 
  and
  (select max( {label/disconnected_date.day},{dataset/materialized_date_dim}) by all other) 
or {label/end_date.day} 
  between 
  (select min( {label/disconnected_date.day},{dataset/materialized_date_dim}) by all other) 
  and
  (select max( {label/disconnected_date.day},{dataset/materialized_date_dim}) by all other)

This metric will return count of rows which match the filter condition. It can be used in breakdown of interval_dataset to filter only matching records.

Productboard

11/10/2023, 1:17 AM

🎉 New note created.

Jakub Sterba

11/10/2023, 1:45 AM

What is the actual use case? Should it filter intervals defined by start and end date in data for which one of the borders fall into the range given by date filter on dashboard? Or do you rather want to filter intervals fully contained/overlapping with the filter date range?

Michael DeWulf

11/10/2023, 2:05 AM

Thank you @Jakub Sterba! I'll review those options in the morning. The actual use case is that I have items that are "owned" for periods of time. I'd like to be able to show who owned those items for a given interval (along w/ other info).

Michael DeWulf

11/10/2023, 2:11 AM

I was reviewing your first solution, but I think that would end up not working for my use case. Specifically, in the case that the start & end date that are in the record are fully outside the selected interval. E.g. start=2023-01-01, end = 2023-12-31 and the selected interval is October 2023.

Michael DeWulf

11/10/2023, 2:12 AM

I think the solution where you enumerate all dates in range is possibly the only way to go, unfortunately.

Michael DeWulf

11/10/2023, 2:12 AM

Obviously, that doesn't scale well w/ large time windows.

Jakub Sterba

11/10/2023, 7:47 AM

Other solution can also be to create dataset with snapshots for each day when item was owned and use it to apply filter via many to many relationship between “snapshot date” to the original dataset. I have recorded this thread as feedback for possible product improvement in future to better support your use case.

❤️ 1

John Munson

11/13/2023, 10:54 PM

In my opinion, the issue illustrated in the diagram below is one of the biggest problems with how the GoodData date picker works. It works great if you have point-in-time data. For example, if you wanted to find all of the purchases within a given date range, you can easily do that. But, it doesn’t seem to work very well at all with data that has a start and an end date. If you wanted to find all marketing campaigns that were active during a given time frame… not possible. If you wanted to find all campaigns that started and ended within the given time frame… not possible. You can only ask questions about the start date or the end date, which doesn’t seem very useful.

Productboard

11/14/2023, 8:26 AM

🎉 New note created.

John Munson

11/14/2023, 4:25 PM

After posting the example in my previous message, I think something “clicked” with my understanding of how GoodData works and what you are “supposed” to do when it comes to modeling data that has a start and an end date. In the example above, where you have marketing campaigns that begin and end, modeling them like this is probably not what you want to do because of the problems discussed above…

Copy code

campaign_name     start     end     ad_spend
new shoes         1/1       3/31    $1,000

This doesn’t really work with how GoodData wants you to model the data. GoodData wants either discreet standalone datapoints, like the purchase data above, or it wants continuous fluid lines that are comprised of many data points. The campaign data model above is kinda like how you would model the purchase data. Each campaign is just a single row in a table. Meaning, each campaign is just a single data point. Meaning that, even though you have a start and end date, you have to choose one or the other that is going to represent the temporal space that the campaign occupies… not ideal. So, what are the alternative options? First, you could recognize that if you start zooming out far enough, lines start to look like individual points. That’s right, we’re talking about grain and granularity. Perhaps we know that our marketing campaigns are sometimes short, maybe just a few days long, but that they always fall within a business quarter. They don’t run longer than a business quarter and they don’t span multiple business quarters. With this in mind, we know that the highest “resolution” that we can look at our campaign data and have it still appear to look like dots instead of lines, is “business quarter”…

Copy code

campaign_name     business_quarter     ad_spend
new shoes         Q1                   $1,000

But there’s also a second option. Instead of making our campaigns look like dots, we could make them look like continuous fluid lines. For each campaign, maybe you have the data collected at a weekly level. Or maybe you don’t, in which case you can simply “atomize” your data by smoothly dividing it. The key with this approach though is to understand that the concept of grain/granularity is still important. Since we’ve modeled the data at a weekly level, that is the highest resolution that we can look at our campaign before we start to see gaps and holes in the line, which is a very similar problem as when we modeled our campaigns with a start and end date… two definitive points in time with one big hole in the middle.

Copy code

campaign_name       week        ad_spend
new shoes           1           $85
new shoes           2           $85
new shoes           3           $85
...etc

With this approach, our campaigns start at the appearance of the first data point, they end at the last data point, and you answer questions about them by aggregating the facts from every data point that belong to them. In addition, modeling the data this way allows you to answer questions about partial date ranges, e.g “How much ad spend, across all campaigns, was there just in week 3?”

John Munson

11/14/2023, 4:42 PM

With that said… • I don’t think the GoodData docs have done a very good job of explaining any of this • There’s no reason the date range picker couldn’t be improved to allow for modeling your campaign data with a start date and end date. In many cases, this is the most natural way to model the data.

John Munson

11/14/2023, 5:03 PM

@Devin Clarke please pay attention to this as you start work on the modeling for our retention dashboards

✅ 1

Ivana Gasparekova

11/16/2023, 10:46 AM

Hey @John Munson, Thanks a lot for sharing your findings with us, really appreciated. We’ve gone ahead and shared your feedback with out Product team as well. I believe, this is a great topic for the upcoming call scheduled for Monday. You should be invited alongside with Michael already.

7 Views

Open in Slack

Previous Next