Hello, I have a dataset that is something like `{ ...
# gooddata-cloud
m
Hello, I have a dataset that is something like
{ start: Date, end: Date, ... }
. I'd like to be able to return records from this dataset where either
start
or
end
is in the selected interval of some Date dataset. Is this possible? If so, how?
s
I think you could use a metric,
but not sure.
There are date diff/offset formulas and depending on how you're searching for the date range, could use that to help include records: • recordid where <input> date <= end or input date >= start
m
But how do you get the input date? I only see the DATE macro for today and offsets from today.
s
Pretty sure the "THIS(DAY)" part is the piece that comes from the dates on the insights where you use it.
Haven't really tested much, but I'm working on an insight for "things due within 30 days of date" kind of scenario. At which point 30 days of THIS(DAY) would be the center
not sure if works for your scenario, but could work.
m
The docs are pretty clear that THIS is based on the current calendar date.
Not the date range interval that is selected.
s
Hmm, ok. I'm not sure what the interval is that you're pulling into the insight is coming from.
you have a date.
not a date range as far as I know. Maybe your setup/situation is different.
m
Users always filter by some date interval.
E.g. this month
s
fair point. Something to ask with GD folks because if you do use like "Last year" what is the date used...
m
They filter the context for you, but this is off topic. This is all documented in MAQL docs.
s
ok fair enough
m
Thank you though for your help, Steve!
1
Anyone from GoodData that can help?
s
(probably in the am as they're mostly based in Europe)
j
This problem probably does not have easy solution. You may try to add bridge dataset of many-to-many relationship and create two records for each row of original dataset (one containing start_date and the other end_date). For example SQL dataset with SQL query:
Copy code
select "id","start_date" as "date",'start' as role from interval_dataset
union all
select "id","end_date" as "date",'end' as role from interval_dataset
Filter for “Date” dataset will be then applied effectively as OR filter for start and end date
Other option is to use a materialized disconnected date dimension containing at least all dates between minimum start_date and maximum end_date. You can generate such series using generate_series in case of PostgreSQL (syntax may differ for other databases):
Copy code
select "disconnected_date"::date from generate_series('2023-01-01'::date,'2023-12-31'::date,'1day'::interval) as "disconnected_date"
This date dimension should have no connection to any dataset in the model and it will be used only to determine minimum and maximum date matching the filter selection. BY ALL OTHER keyword lets you use the result as “constant” in expressions of disconnected datasets. You can define filter using this disconnected date dimension in insight first and then apply following metric:
Copy code
select count( {dataset/interval_dataset})
where {label/start_date.day} 
  between 
  (select min( {label/disconnected_date.day},{dataset/materialized_date_dim}) by all other) 
  and
  (select max( {label/disconnected_date.day},{dataset/materialized_date_dim}) by all other) 
or {label/end_date.day} 
  between 
  (select min( {label/disconnected_date.day},{dataset/materialized_date_dim}) by all other) 
  and
  (select max( {label/disconnected_date.day},{dataset/materialized_date_dim}) by all other)
This metric will return count of rows which match the filter condition. It can be used in breakdown of interval_dataset to filter only matching records.
p
🎉 New note created.
j
What is the actual use case? Should it filter intervals defined by start and end date in data for which one of the borders fall into the range given by date filter on dashboard? Or do you rather want to filter intervals fully contained/overlapping with the filter date range?
m
Thank you @Jakub Sterba! I'll review those options in the morning. The actual use case is that I have items that are "owned" for periods of time. I'd like to be able to show who owned those items for a given interval (along w/ other info).
I was reviewing your first solution, but I think that would end up not working for my use case. Specifically, in the case that the start & end date that are in the record are fully outside the selected interval. E.g. start=2023-01-01, end = 2023-12-31 and the selected interval is October 2023.
I think the solution where you enumerate all dates in range is possibly the only way to go, unfortunately.
Obviously, that doesn't scale well w/ large time windows.
j
Other solution can also be to create dataset with snapshots for each day when item was owned and use it to apply filter via many to many relationship between “snapshot date” to the original dataset. I have recorded this thread as feedback for possible product improvement in future to better support your use case.
❤️ 1
j
In my opinion, the issue illustrated in the diagram below is one of the biggest problems with how the GoodData date picker works. It works great if you have point-in-time data. For example, if you wanted to find all of the purchases within a given date range, you can easily do that. But, it doesn’t seem to work very well at all with data that has a start and an end date. If you wanted to find all marketing campaigns that were active during a given time frame… not possible. If you wanted to find all campaigns that started and ended within the given time frame… not possible. You can only ask questions about the start date or the end date, which doesn’t seem very useful.
p
🎉 New note created.
j
After posting the example in my previous message, I think something “clicked” with my understanding of how GoodData works and what you are “supposed” to do when it comes to modeling data that has a start and an end date. In the example above, where you have marketing campaigns that begin and end, modeling them like this is probably not what you want to do because of the problems discussed above…
Copy code
campaign_name     start     end     ad_spend
new shoes         1/1       3/31    $1,000
This doesn’t really work with how GoodData wants you to model the data. GoodData wants either discreet standalone datapoints, like the purchase data above, or it wants continuous fluid lines that are comprised of many data points. The campaign data model above is kinda like how you would model the purchase data. Each campaign is just a single row in a table. Meaning, each campaign is just a single data point. Meaning that, even though you have a start and end date, you have to choose one or the other that is going to represent the temporal space that the campaign occupies… not ideal. So, what are the alternative options? First, you could recognize that if you start zooming out far enough, lines start to look like individual points. That’s right, we’re talking about grain and granularity. Perhaps we know that our marketing campaigns are sometimes short, maybe just a few days long, but that they always fall within a business quarter. They don’t run longer than a business quarter and they don’t span multiple business quarters. With this in mind, we know that the highest “resolution” that we can look at our campaign data and have it still appear to look like dots instead of lines, is “business quarter”…
Copy code
campaign_name     business_quarter     ad_spend
new shoes         Q1                   $1,000
But there’s also a second option. Instead of making our campaigns look like dots, we could make them look like continuous fluid lines. For each campaign, maybe you have the data collected at a weekly level. Or maybe you don’t, in which case you can simply “atomize” your data by smoothly dividing it. The key with this approach though is to understand that the concept of grain/granularity is still important. Since we’ve modeled the data at a weekly level, that is the highest resolution that we can look at our campaign before we start to see gaps and holes in the line, which is a very similar problem as when we modeled our campaigns with a start and end date… two definitive points in time with one big hole in the middle.
Copy code
campaign_name       week        ad_spend
new shoes           1           $85
new shoes           2           $85
new shoes           3           $85
...etc
With this approach, our campaigns start at the appearance of the first data point, they end at the last data point, and you answer questions about them by aggregating the facts from every data point that belong to them. In addition, modeling the data this way allows you to answer questions about partial date ranges, e.g “How much ad spend, across all campaigns, was there just in week 3?”
With that said… • I don’t think the GoodData docs have done a very good job of explaining any of this • There’s no reason the date range picker couldn’t be improved to allow for modeling your campaign data with a start date and end date. In many cases, this is the most natural way to model the data.
@Devin Clarke please pay attention to this as you start work on the modeling for our retention dashboards
1
i
Hey @John Munson, Thanks a lot for sharing your findings with us, really appreciated. We’ve gone ahead and shared your feedback with out Product team as well. I believe, this is a great topic for the upcoming call scheduled for Monday. You should be invited alongside with Michael already.