I suppose the answer has two parts:
1. how to connect everything in the data model
• In general, you just add a new dataset and connect it to the datasets which you want to be able to compare it by (i.e. age, gender…).
• Basically any dimension/attribute by which you want to be able to compare the actual data with the benchmark needs to be connected to both - the actual data and the benchmark. And the best way to do it is to put them to shared datasets which are connected to both
• you most probably want to have the benchmark data very heavily aggregated before loading to GoodData - first to prevent clients recognizing details of other client’s data - and second to also save some data volume/costs
• unlike the other datasets, you probably want to load the benchmark dataset to all the clients as is (so without the x__client_id column)
2. how to calculate the average from aggregated data
• because you mentioned you want to see the average score from the benchmark and because you will be most probably counting the average from pre-aggregated data, be careful how you calculate the average
• you can not just calculate the average in the aggregated dataset and then calculate the average out of these rows (you would need a weighted average, because each row in the aggregated dataset can represent different number of original rows)
• so ideally in the aggregated dataset you will also represent the number of rows from which it was aggregated (i.e. number of responses - in SQL basically a COUNT(*) ) and then use it for weighting the average
◦ in SQL it would be something like:
SELECT
age,
gender,
income,
education,
SUM(score) as total_benchmark_score,
COUNT(*) as total_benchmark_cases
FROM detailed_client_data_table
GRPOUP BY
age,
gender,
income,
education
• the simplest way can be to just store total (SUM) of score and COUNT of rows which each benchmarking record represents and then let the Average be calculated dynamically in GoodData MAQL like:
SELECT SUM(total benchmark score) / SUM(total benchmark cases)