GoodData.CN

GoodData.CN: Data Filters

2 years ago
1 July 2021
0 replies
138 views

Userlevel 1

dan.jun
GoodData Employee
1 reply

Introduction

One of the core strengths of GoodData.CN (GD.CN) workspace hierarchies is the ability to create a network of connected workspaces, each outfitted with a uniquely limited version of the same datasets. This action of limiting data is done through applying Data Filters to each child workspace. The benefit of this approach is that your analytic solution can be quickly and easily scaled for a variety of different data privilege levels. It also circumvents the need for costly, partial duplications of your tables on your underlying database.

Data Filters can be created using any of the attributes that exist within a given hierarchies’ LDM. The data filter uses the filter value assigned to a child workspace to distinguish what data should be made available in this workspace. The data filter compares the filter value to the values in the specified column in the data source tables. If the filter value matches the column value for some data record in a table, this record is made available in this workspace. The data records not matching the filter value are not available in the child workspace

Creating and Updating Data Filters

There are a few different ways to submit new Data Filters, or update existing ones. The first is to submit a JSON file with the full batch data filters you want to apply across all of your child workspaces. This method will refresh the data filter metadata to match the latest posted JSON file. Using this mechanism, you can make updates to your data filters through your JSON file and then repost the JSON file to publish updates.

The full refresh JSON can also be submitted in two different ways:

JSON file of only Data Filters

curl $ENDPOINT/api/layout/workspaceDataFilters \
  -H "Authorization: Bearer $GDC_API_TOKEN" \
  -H "Content-Type:application/json" \
  -X PUT -d @/path/to/your-filter-definition.json

{
  "workspaceDataFilters": [
    {
      "id": "region",
      "title": "Customer Region",
      "columnName": "wdf__region",
      "dataSourceId": "demo-ds",
      "workspace": {
        "id": "headoffice",
        "type": "workspace"
      },
      "workspaceDataFilterSettings": [
        {
          "id": "region_west",
          "title": "Western States",
          "filterValues": [
            "West"
          ],
          "workspace": {
            "id": "western",
            "type": "workspace"
          }
        }
      ]
    },
    {
      "id": "state",
      "title": "Customer State",
      "columnName": "wdf__state",
      "dataSourceId": "demo-ds",
      "workspace": {
        "id": "western",
        "type": "workspace"
      },
      "workspaceDataFilterSettings": [
        {
          "id": "state_california",
          "title": "California",
          "filterValues": [
            "California"
          ],
          "workspace": {
            "id": "california",
            "type": "workspace"
          }
        }
      ]
    }
  ]
}

JSON file with overarching workspace layout

curl $ENDPOINT/api/entities/workspaces/<workspace_id>/workspaceDataFilters/<object_id> \
  -H "Authorization: Bearer $GDC_API_TOKEN" \
  -H "Content-Type:application/json" \
  -X PUT -d @/path/to/your-filter-definition.json

ws-layout.txt ← please resave as JSON file

For more in depth instructions on implementing full refresh data filters through JSON files, see here.

However, while reposting the full set of data filters is generally a quick process, after reaching a large number of filters, the performance may begin to deteriorate. At this point, if you’d like to avoid fully refreshing all of the data filters, you can instead create and update individual data filters through the following commands:

CREATE:

curl $ENDPOINT/api/entities/workspaces/<workspace_id>/workspaceDataFilters \
  -H "Authorization: Bearer $GDC_API_TOKEN" \
  -H "Content-Type:application/json" \
  -X POST \
  -d @/path/to/your-filter-definition.json

UPDATE:

curl $ENDPOINT/api/entities/workspaces/<workspace_id>/workspaceDataFilters/<object_id> \
  -H "Authorization: Bearer $GDC_API_TOKEN" \
  -H "Content-Type:application/json" \
  -X PUT \
  -d @/path/to/your-filter-definition.json

More detailed information on how to use these API calls can be found in our API documentation.

Data Filter Limitations - as of GD.CN v1.1.1

No Range Filters:
There currently isn’t the option to set a filter based on a range of values (ie. >, <, >=, <=). Only specific value(s) can be specified for filtering.
Filtering Attributes with the same name:
As of GoodData.CN v1.1.1, there isn’t a way to specify which dataset’s instance of an attribute name to filter. So, if there’s a shared attribute name between multiple datasets (ie. “ID”), then each dataset with that attribute name will be filtered. As a best practice, try to uniquely name each attribute when possible.
Data Filters only apply to dataset with the filtered attribute:
Take a look at the LDM below:

If a data filter is applied to Products for the workspace to only analyze where “Product Category” denotes Electronic products, the Transactions dataset will still display non-Electronic product transactions. As a best practice, a data filter should also be applied to the Transactions dataset. To address, we will show how you can configure data filters for fact datasets in the following sections.

Hidden Attribute Data Filters

GoodData.CN also allows you to add “hidden” attributes onto a dataset that can then be used to filter that dataset. To do so, append the prefix “wdf__” to the column name of the table in your underlying database. This will prevent the Logical Data Model (LDM) builder from including that column in the LDM, which will prevent downstream users from seeing that attribute in the semantic model and data catalog.

A common reason to use this feature is in the case you have a multi-tenant solution. Let’s say you have a Sales Transactions dataset, where each transaction is associated with a unique client. In order to split out the Sales Transaction records into the appropriate workspaces, you will need to include, and filter, a field such as “client_id” in the Sales Transactions dataset. However, because this “client_id” field serves no analytical purpose and is only present for filtering reasons, hiding this field by calling it “wdf__client_id” will simplify the LDM.

Another reason to use hidden attributes is to add data filters to fact datasets, as mentioned in the previous section. Let’s look again at the LDM from before:

ma9vyBYoPdTev6hfv6GqXDiqKhxQYWBKbd-Gje45yHplX2kEUVGMT4huOiudbJaALeDtOLJFh-TMMUy9DiLkMgoXLx5WufsKxRaORWZEbYP4rVVaQe_kUW1kPDRgOkitH_kSMw8u

In the original example, we talked about adding a data filter on “Product Category” in the Products dataset. To make this filter also apply to Transactions, add column “wdf__product_category” to your underlying Transactions table in your database. This field should then be populated with the corresponding product category value. When submitting your data filter metadata, make sure to apply the data filter on both Products “Product Category” field, and Transactions “wdf__product_category” field. This is illustrated in the following example.

Data Filtering Examples

Take a look at the following Logical Data Model (LDM):

9UkayBh2nKaNEUhQlTwdZ0UNa7f7asJIRj5oWvtiW60V123pBEqC1_0_CLT6u0H8ASXY2QFA2aoIv0lM7qlQHGT_AMYkFegmwne6wQziycoaTD_ZFaikcFQxnH37wejQVTqjnTyf

Note: Though not shown here, the Order lines dataset has hidden fields “wdf__client_id” and “wdf__department_id”.

This LDM is used to create a parent workspace Premium Master and two child workspaces: Carson’s Home Goods Department and Mercy Clothing Department.

zSzsCehCoxcL77DrfqmLi8GZfVXmS7O3fzpWiLEHXmS4WJUmsOSGU9YVjf8c8fkgLJcbQqrXIv7v6AXzg9R29wfnyz50Ju0W75cUj_KEaBlrT50P_NY7sxO7ARktehee9CPWEZyD

In this case, we will need to split up the underlying data both by the Client (Carson’s and Mercy), as well as the Department. Let’s look at the filtering required on the Order lines dataset. In the below example, we are applying data filters using the overarching JSON file post method.

{
  [
    {
      "columnName": "wdf__client_id",
      "dataSourceId": "client-ds",
      "id": "premium_client_workspaces",
      "title": "Premium Client Filter",
      "workspace": {
        "id": "premium_master",
        "type": "workspace"
      },
      "workspaceDataFilterSettings": [
        {
          "filterValues": [
            "carsons"
          ],
          "id": "client_premium_carsons",
          "title": "Premium Carson's",
          "workspace": {
            "id": "carsons_home",
            "type": "workspace"
          }
        },
        {
          "filterValues": [
            "mercy"
          ],
          "id": "client_premium_mercy",
          "title": "Premium Mercy",
          "workspace": {
            "id": "mercy_clothing",
            "type": "workspace"
          }
        }
      ]
    },
    {
      "columnName": "wdf__department_id",
      "dataSourceId": "client-ds",
      "id": "premium_department_workspaces",
      "title": "Premium Department Filter",
      "workspace": {
        "id": "premium_master",
        "type": "workspace"
      },
      "workspaceDataFilterSettings": [
        {
          "filterValues": [
            "Home"
          ],
          "id": "client_premium_carsons_home",
          "title": "Carson's Home Department",
          "workspace": {
            "id": "carsons_home",
            "type": "workspace"
          }
        },
        {
          "filterValues": [
            "Clothing"
          ],
          "id": "client_premium_mercy_clothing",
          "title": "Mercy Clothing Department",
          "workspace": {
            "id": "mercy_clothing",
            "type": "workspace"
          }
        }
      ]
    }
  ]
}

As you can see, we are adding two distinct data filters on “wdf__client_id” and “wdf__department_id”. These are then both split again between the two different child workspaces. Another note though is that “wdf__department_id” is actually split out based on the “Product Category” field from the Products dataset. So, in order for the Products dataset and the Order lines dataset to stay consistent, another data filter should be added to Products:

While this same filtering could also be applied to Campaigns, Campaign Channels, and Customers, by leaving them unfiltered, the full range of data from each datasets will be inherited by both child workspaces. These datasets can be referred to as shared datasets.

GoodData.CN: Data Filters

***If you are interested in* *GoodData.CN, please contact us.***

Alternatively, test a trial version of GoodData Cloud:

Start Your GoodData Analytics Trial — Commitment-Free

1 Attachment

0 replies

Reply

If you are interested in GoodData.CN, please contact us.

Alternatively, test a trial version of GoodData Cloud:

Start Your GoodData Analytics Trial — Commitment-Free

1 Attachment

Reply

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded

***If you are interested in* *GoodData.CN, please contact us.***