How should we model multi valued attributes e g stored as an GoodData #gooddata-cn

Join Slack

How should we model multi-valued attributes? e.g. ...

# gooddata-cn

Sheldon Nathan

02/17/2023, 3:13 AM

How should we model multi-valued attributes? e.g. stored as an array in postgres.

Jakub Sterba

02/17/2023, 7:20 AM

Hi @Sheldon Nathan, you can transform it into M:N relationship in logical data model.

Jan Soubusta

02/17/2023, 10:45 AM

M:N is definitely a way to go in this case, but if I understand the @Sheldon Nathan situation correctly, he has these attribute values stored in PostgreSQL ARRAY data types, which is not supported by GD.CN/cloud, right? So he would have to implement an ELT process to extract these arrays, right?

Jakub Sterba

02/17/2023, 12:25 PM

Exactly. The ARRAY data type is not supported in data mapping of data in DB to GD logical model, so it is necessary to transform the data by ETL process or using database view (defined by SQL query) into data mart structures which do not contain complex data types such as arrays. There may be other alternatives how to build data mart structures in some some simple cases. For an array with fixed small size you can split it to separate columns (e.g.

attribute_1

attribute_2

, etc.) Multi value attribute which contains for example values like

{'tag1'}

{'tag1','tag2'}

can be transformed by ETL into attribute

Is tag1

=Y/N and

Is tag2

=Y/N. The modeling of data mart using M:N can cope with any size of the array. You can select the approach depending on the use case. Each of the mentioned modeling approaches will offer slightly different user experience for users who will be creating insights using GoodData.

Jan Soubusta

02/17/2023, 12:32 PM

@Jakub Sterba AFAIK there is an internal Jira in our backlog focused on complex data types. Could you extend it by this use case (if it is not already there)?

Productboard

02/17/2023, 12:55 PM

🎉 New note created.

Sheldon Nathan

02/20/2023, 5:41 PM

The ask/need here is two fold; 1. Allow displaying multiple values in charts e.g. multi choice responses on a form. What did a person eat when their sugar levels exceeded a certain threshhold - User ate fish, beans and soup. 2. Allow analysis on array type attributes - Show me sugar levels when user's meal included fish and eggs. @Jan Soubusta what is the requirement/design/outcome of the ticket to support array/complex data types?

Jan Soubusta

02/20/2023, 5:50 PM

I just extended it, now it looks like this: • JSON data type ◦ Create an attribute/fact from JSON element (key is attribute name/value is its value ◦ Document the performance impact and recommendation (e.g. some databases can index JSON fields, …) ◦ The first step could be easier - what if I would specify a column name as “person.name”? Would it generate

SELECT "person.name"

SQL? If yes, it could work out of the box. • Array data type ◦ Fact representing COUNT of elements ◦ Attribute ▪︎ Search use case - e.g.

"COMPLETED" in Status

Status like "Compl%"

▪︎ Slice by the attribute(ARRAY), display the value concatenated by e.g. space or comma

👍 1

Jan Soubusta

02/20/2023, 5:53 PM

Btw there is one more solution coming soon - so called SQL datasets. You will be able to create a new type of dataset in our LDM - SQL dataset, defined by any SQL query. In this case, you could create a STRING attribute by applying

array_to_string()

function. Would it be an acceptable temporary(or even permanent) solution for you?

Sheldon Nathan

02/20/2023, 5:55 PM

On the JSON support, I believe databases that support JSON have a slightly different SQL syntax on referring to the attribute and nested fields.e.g. Postgres uses ->; while oracle requires a function wrapper - will this be customizable based on the DB or will it be baked in

Sheldon Nathan

02/20/2023, 5:57 PM

SQL datasets sound interesting ... when is it targetted for?

Jan Soubusta

02/20/2023, 5:58 PM

Good point! If we support complex data types like proposed above, we will generate the right dialect based on the type of your data source, just like we already do it, e.g. in the case of date arithmetic(really tricky). In the case of SQL datasets, you have to use the corresponding dialect, and once you decide to migrate to different DB, you will have to update SQL datasets accordingly.

Jan Soubusta

02/20/2023, 5:59 PM

SQL datasets should be release in March, maybe under feature flag first, but very soon for all.

Jan Soubusta

02/20/2023, 5:59 PM

Watch the release notes 😉

Sheldon Nathan

02/20/2023, 5:59 PM

And will this be cloud only or on CN too?

Jan Soubusta

02/20/2023, 6:00 PM

No limitation

Sheldon Nathan

02/20/2023, 6:00 PM

Also, with SQL datasets how will variable length arrays be handled in the LDM ? we do not know the number of elements (LDM attributes) at compile time

Sheldon Nathan

02/20/2023, 6:02 PM

Consider - SQL dataset for 'Meal items' which could have 1 or more of {fish,eggs,beans,beef, rice, salad}

Jakub Sterba

02/20/2023, 6:05 PM

I assume it will be just single attribute if you transform array into string using

array_to_string()

function.

Jan Soubusta

02/20/2023, 6:10 PM

exactly. But it comes with obvious limitations. When you concat it, you cannot apply exact match when creating e.g. filters, like attribute in ('fish', 'eggs'), instead you can apply only LIKE. So definitely it is just a partial solution.

Jakub Sterba

02/20/2023, 6:11 PM

Other option can be to use

unnest()

function and transform data to multiple rows and use it with M:N relation in LDM. The attribute "Meal" will then let you select data which contains at least one of the selected meals in the filter dropdown.

Jakub Sterba

02/20/2023, 6:13 PM

With more complex metric you can select probably data which contain all selected meals.

Jakub Sterba

02/20/2023, 6:15 PM

You can probably combine the approaches and add into your model one concatenated attribute for displaying multi-value data in a single table cell and M:N relation to Meal dimenision dataset for filtering use cases.

Sheldon Nathan

02/20/2023, 6:15 PM

@Jakub Sterba I suppose that could be one approach

Jakub Sterba

02/20/2023, 6:17 PM

You can also create dimension with all combinations and use M:N relation between these combinations and Meal dimension. It may reduce size of bridge dataset and help query performance

Sheldon Nathan

02/20/2023, 6:17 PM

would be great if the interpretation of the usage could be handled internally by GD for array types; that would reduce the learning curve for insight creators 🙂

Sheldon Nathan

02/20/2023, 6:18 PM

@Jakub Sterba could you point me to any documentation around creating the dimension?

Jakub Sterba

02/20/2023, 6:20 PM

I meant by this dimension table/dataset containing all the unique meal names.

Jakub Sterba

02/20/2023, 6:25 PM

You can learn about dimensional approach for example in this course https://university.gooddata.com/designing-data-models

Open in Slack

Previous Next