Hey channel, I am trying to figure out, is there ...
# gd-beginners
a
Hey channel, I am trying to figure out, is there a simpler way to achieve my desired result: • I have several products (2-3) • I have a table with
views
(one record per view, like
view_id
+
event_date
+
product
) • I have a table with
conversions
(one record per conversion action, like
conversion_id
+
event_date
+
product
) • I have a logical model in GD
event_date
+
views
+
conversions
• I want to build a chart / table with following result:
event_date
+
product
+
conversion_ratio (count of conversions for current date and product divide by count of views for current date and product
If I will use SQL query, it will look like this (with a bit of simplifications):
Copy code
with daily_views as (
  select event_date, product, count(1) as views_count
  from views
  group by 1, 2
),
daily_conversions as (
  select event_date, product, count(1) as conv_count 
  from conversions
  group by 1, 2
),
joined as (
  select *
  from daily_views v
  left join daily_conversions c on c.event_date = v.event_date AND c.product = v.product
)
select
  event_date,
  product,
  conv_count / views_count
from joined
The crucial part here is the final join on 2 column key (date + product). As far as I understand, GD could not do such joins:
Tip 4: If you need to join/reference two datasets using a multi-column key (which is not possible in GoodData), generate a separate single attribute which will contain concatenated values of all the columns used in the key. For example, if you plan to join by companyID + orderID, create an attribute which combines these two.
I could follow this tip and prepare my data in SQL in my ETL process, and ship to GD final result. But I am wondering, is there a way to achieve the same result in GD, using models over
views
and
clicks
not aggregated tables (or at least using pre-aggregated
daily_views/conversions
tables)?
j
Hi @Alex Sher, GoodData implements drill across feature so you do not need to care about complexity of joining data coming from different fact datasets if they use the same dimensions. For example full outer join is performed if you place metric counting number of views and metric showing number of conversions into the same table with dimensions date and product. For arithmetic operation such as calculation of conversion ratio (
select count(conversion_id)/count(view_id) )
will perform inner join between count results coming from conversions dataset and count results coming from views dataset. I think inner join is enough for your use case, but in case you want to include also 0 for products which had a views on given date but no conversions you can achieve right join by using ifnull function:
select ifnull(count(conversion_id),0) / count(view_id)
this will result in using right join. Use of ifnull for both operands would result in full outer join between the operands but that probably does not make sense due to division by zero in the metric. You do not need to do anything special to achieve this in the data model. Just use 1:N relations between product dataset and date dimension dataset to both tables (fact datasets shall be connected to the shared dimensions with arrow pointing towards fact datasets)..
looking again at your question you probably need to refactor your model a bit. You need separate dimension dataset
products
which will contain primary key “`product`” and you have to connect it to both fact datasets. Columns
product
in these fact datasets will be foreign keys. The query engine will then understand that
product
is a conformed dimension.
Model like this should work
But in following model
Product
in
Conversions
dataset is different attribute from
Product
in
Views
dataset. You will see two Product attributes in Analyze but you will be able to use always only one of them in insight.
a
Hey @Jakub Sterba, thanks a lot for such detailed explanation! I am a little shocked of detail level of help provided by you, this is a huge sale point for GD 🙂 After your explanation, I got that
You need separate dimension dataset
products
which will contain primary key “`product`” and you have to connect it to both fact datasets. Columns
product
in these fact datasets will be foreign keys. The query engine will then understand that
product
is a conformed dimension. (edited)
this is exactly missing piece in my model. Thanks again for pointing this out, I will try to build model this way and ask more questions, if I have some troubles with that approach. Looks pretty straightforward for me 🙂