i have a MAQL question involving an outer join tha...
# gd-beginners
e
i have a MAQL question involving an outer join that i'd appreciate some help with. i have 2 tables (
fct_lines
and
dim_returns
) with a foreign key relation and 2 metrics (
q1/q2
and
fit q1/q2
). when i compute
q1/q2
by
Attr
i get the expected result:
Copy code
Attr  q1/q2
-----------
attr    0.8
but when i add the second metric the result of the first changes:
Copy code
Attr  q1/q2  fit q1/q2
----------------------
attr    1.0       0.75
• is there a way of defining
q1/q2
in a way the prevents this interaction? how do i predict the possibility of this interaction? • the
q1/q2
metric is sensitive to whether `fit q1/q2`` contains a
FOR EACH
clause. is this expected?
Copy code
fct_lines
Key	Attr	Return_Key	Q1	Q2
------------------------------
101	attr	       1	1	1
102	attr	 	   NA   0	1
103	attr	       2	2	2
104	attr	       3	1	1

dim_returns
Return_Key  Fit
---------------
1          True
2          True
3         False

dim_returns.Return_key <- fct_lines.Return_Key

# fit q1/q2
SELECT (SELECT SUM({fact/q1}) WHERE {label/fit} = "true") / SUM({fact/q2} )

# q1/q2
SELECT SUM({fact/q1}) / SUM({fact/q2})
thanks!
j
Hello Emile, There are variations of the BY clause that can help you lock aggregation, but these depend on the the connection points used in your LDM.
e
thanks for the suggestion Joseph, in searching for a solution i've tried
BY
every available attribute in the data model but none prevent the effect of the second metric on the first. can you suggest a concrete clause to use, in light of the details provided?
j
Sorry I missed your inquiry on the FOR EACH clause... this is expected behavior. This allows you to display all attribute values in your visualization, even if they are missing.
e
thanks Joseph, i haven't spotted anything in the documentation you reference explaining why the presence of
FOR EACH
in one metric would have an effect on another metric. i'm searching for a way to predict the possibility of this happening and to guard against it. can you help me understand why this happens and how to prevent it?
m
Hi Emile, you are correct that one metric should not affect how another metric is calculated. That is true as long as the requirements for data consistency is met. If I understand your data correctly the line with Key=102 does not reference any row of the dim_returns and has NA/null value there? If yes then this might be the reason of this behavior. GoodData expects any reference to be to a valid key in the dimension table. I believe what is happening is that the
q1/q2
metric does not need to join the dim_returns so it calculates on the fact_lines table only, but the moment you add the
fit q1/q2
metric, this table has to be joined (to evaluate the value of
fit
). To prevent too heavy queries i.e. scanning the same table multiple times for multiple metrics, there are some optimizations and these assume that all the references are valid. But here the inner join filters out the line 102 and therefore it is not calculated even in the
q1/q2
metric. To fix this, you can add the row with NA return_key value to your dim_returns. Then everything should work as expected. I am not sure if these requirements for data consistency are stressed enough in the documentation so let me check with the documentation team if we can make them clear there.
e
hi Michal, thanks for the suggestion to include NA/NULL in the dim_returns primary key. this doesn't work because primary keys can't contain NULLs. you mention "valid key" but the foreign key relationship seems valid. the tables and the relation are realisable in the logical data model without any errors or warnings. can you suggest a diagnostic for validity? i'm assuming NULLs and outer joins are permissible, based on the existence of FOR EACH.
m
I believe the FOR EACH is meant for the opposite case where you would have Return_Keys that do exist in dim_returns but do not have any corresponding rows in fct_lines. In that case you could still display them for a metric concerning data from fct_lines. But internally the system assumes that any referencing row targets a value that does exist in the target table. (the references/joins are directional). In other words - in your case, while not every dim_return needs to have a fct_line, each and every fct_line has to have exactly one dim_return. The data model validation is not checking the contents of the data in the tables, just the structure and mapping. That is why it is not complaining about it. I see that if you have actual null value there, it does not allow you to insert null key to your actual table. You might overcome this specific i.e. with a view (or SQL dataset stored in the data model) that would convert the null references to some other reasonable value (0, -1, ‘NA’) in fct_lines, and a second one which would add this value to dim_returns with a UNION ALL for example. Or alternatively pre-join these two tables and make them a single table/view from the GoodData model point of view.
e
thanks Michal,
each and every fct_line has to have exactly one dim_return
can you point me to the reference for this? the example tables and metrics i quoted earlier was distilled from a larger schema where pre-joining won't be feasible. i accept the suggestion you made for making sure outer joins don't occur. i can confirm this prevents the interaction between metrics but i find the need to do that questionable. my understanding from this is that the MAQL semantics depend on the presence of NULLs, and there are no warnings for getting it wrong?