Hi team, I’m using the ADD to distribute the data ...
# gooddata-platform
j
Hi team, I’m using the ADD to distribute the data to the client workspaces, yet I still want to be able to benchmark the client’s score against the total average score of all the clients. The total average score should also be possible to filter by a couple of background variables with several options, such as age interval, gender, income and education level. So for example, perhaps I only want to see total average score for the women over 50. As I’ve understood it, I would in this case need to add a new table to the database, containing aggregated values, but still wondering how to best structure the table and the data, for it to work as described above? Thanks!
m
I suppose the answer has two parts: 1. how to connect everything in the data model • In general, you just add a new dataset and connect it to the datasets which you want to be able to compare it by (i.e. age, gender…). • Basically any dimension/attribute by which you want to be able to compare the actual data with the benchmark needs to be connected to both - the actual data and the benchmark. And the best way to do it is to put them to shared datasets which are connected to both • you most probably want to have the benchmark data very heavily aggregated before loading to GoodData - first to prevent clients recognizing details of other client’s data - and second to also save some data volume/costs • unlike the other datasets, you probably want to load the benchmark dataset to all the clients as is (so without the x__client_id column) 2. how to calculate the average from aggregated data • because you mentioned you want to see the average score from the benchmark and because you will be most probably counting the average from pre-aggregated data, be careful how you calculate the average • you can not just calculate the average in the aggregated dataset and then calculate the average out of these rows (you would need a weighted average, because each row in the aggregated dataset can represent different number of original rows) • so ideally in the aggregated dataset you will also represent the number of rows from which it was aggregated (i.e. number of responses - in SQL basically a COUNT(*) ) and then use it for weighting the average ◦ in SQL it would be something like:
Copy code
SELECT
  age,
  gender,
  income,
  education,
  SUM(score) as total_benchmark_score,
  COUNT(*) as total_benchmark_cases 
FROM detailed_client_data_table
GRPOUP BY 
  age,
  gender,
  income,
  education
• the simplest way can be to just store total (SUM) of score and COUNT of rows which each benchmarking record represents and then let the Average be calculated dynamically in GoodData MAQL like:
SELECT SUM(total benchmark score) / SUM(total benchmark cases)
oh and ideally the key of the benchmark dataset would be compound from all the foreign keys (plus any potential other attributes) - I missed that in my example screenshot
j
Hi, many thanks for your reply, I'll try and see if I can get this into place as you described, 🌟