Solved

Analyzing "multi-select" values

  • 8 October 2021
  • 3 replies
  • 282 views

  • New Participant
  • 5 replies

Hello,

I think I need a guidance on the data modeling issue.

In our app we have multi-select fields that we want to analyze in GoodData. From the user perspective it's possible to fill out some forms with fields of different types, e.g. text fields/numeric fields and select fields. The select fields are basically single select and multi select ones. All the fields are custom (means they are a separate entity in the database and can be created/deleted/modified, they are NOT a static attribute).

From the database perspective we have a model called “Scoop” which can have many “Fields”, and those “Fields” can have multiple “Choices” if the field type is multi-select. And now we want to analyze such fields and their values. Example:

Field: Panel Types
Field Choices: Panel 1, Panel 2

Report: Show % of scoops with Panel1 and Panel2 vs. those with just Panel1. (pie chart)

Is that possible to achieve?
If you need any more details just let me know.

Thanks in advance!

icon

Best answer by Boris 11 October 2021, 12:34

View original

3 replies

Userlevel 2

Hi Niko,

It should be possible to do this, although there is no single direct solution. I would come down to the format of how the data are exported from your application.

Without further knowledge of your solution I can propose two ways of how to model this:

1: Duplicated lines for each user for each selection:

User Choice
User_1 Panel 1
User_1 Panel 2
User_2 Panel 1
User_3 Panel 1
User_3 Panel 2

 

Then you can solve your case with following metrics:

First you count number of choices per user (named Choice by User):

SELECT COUNT(Choice) BY User

Then you count number of users with only one choice and with both choices:

SELECT COUNT(User) WHERE Choice by User=1 (named i.e 1Choice)

SELECT COUNT(User) WHERE Choice by User=2 (named i.e. 2Choice)

 

Then you can visualize this in pie chart by putting these two metrics.

To have percentages you can work further with these numbers, i.e.:

Percentage of 1 = 1Choice / (1Choice + 2Choice)

 

Of course your use case might be more complicated and you might have more than two choices, but that can be handled by filters directly in the metric:

I.e.: 

SELECT COUNT(Choice) BY User WHERE Choice IN (Panel 1,Panel 2

 

Also, with this approach, you might need to generate some unique ID to identify each line (or use compound key User + Choice).

2: Second way would be to have a separate attribute for every choice:

User Panel 1 Panel 2
User1 1 1
User2 1 0
User3 1 1

 

And the metrics would be similar - you would be creating separate metrics to count the users that selected the values.

 

Third approach would be to have all the selected choices in one string, i.e. “Panel 1,Panel 2” and use LIKE clause to extract the individual values, but I don’t think this is optimal, it would definitely require the most work at the end.

 

If would say the first option is more flexible - you don’t have to change the model if you would add a new field, but the metrics would be more intuitive in the second option. 

But as mentioned, it would come down to how you are able to extract the data.

 

I hope this helps.

 

Best regards,

Boris from GoodData Support team

@Boris thanks a lot Boris for that explanation. I just wanna ask about option 2, because in my app I can have a lot of custom choices which I believe would not be suitable if I choose this option? Would that require creating separate dataset and putting there placeholders for that? At the moment it’s not a problem to make data transformation on my side, since I’m uploading custom CSV to work with GoodData.

Thank you

Userlevel 2

Hi Niko,

You are correct. Option 2 is less scalable and would require separate dataset for each new field and separate attribute for each choice. If you would add the choice, you would need to add a new attribute.

For your use case, I believe option 1, might be better.

Reply