Hi I have a question about connection points and references GoodData #gd-beginners

Hi! I have a question about connection points and ...

Allan Talver

11/10/2022, 2:36 PM

Hi! I have a question about connection points and references in LDM. The scenario is that I have 2 datasets: invoices and clients. Invoices are imported from ERPs (which there are many) and clients from a single CRM. It is possible that a client appears in the ERP and gets an invoice before they exist in the CRM. Client ID is the connection point in the clients dataset and invoices dataset has a column that references the clients. Question: Is it possible to have a report which would show client IDs that get imported from ERPs (exist in the incoming invoices data), but don’t have a match in the CRM (no matching record in the clients dataset)? Technically, what happens to the invoice lines that can’t be mapped to a client? Do they exist in the dataset or do they get discarded? If they do exist then is it possible to see them? Thanks! 🙂

Joseph Heun

11/10/2022, 4:44 PM

Hi Allan, you can use the connection point in a BY clause to help display the data connected in the LDM in an insight. There are some great free courses on our GoodData University page on Modeling and MAQL metrics

Jakub Sterba

11/10/2022, 9:23 PM

GoodData Platform assumes 1:N relationship between dimension dataset (clients) and fact dataset (invoices) if foreign key in fact dataset references primary key in dimension dataset. (0,1):N relationship may lead to unexpected results. In scenarios where you aggregate fact from the fact dataset and slice it by the client ID, such query gets data from the fact dataset unless there is some object in the insight which requires to join also the dimension dataset. The dimension dataset may be however joined when you add another metric to such insight which needs to obtain data from the dimension dataset because query optimizer may combine such queries into one assuming it will return the same results because of 1:N relationship. You may compare number of client IDs in both datasets using following metrics •

SELECT COUNT(Client ID,Client ID)

- returns number of unique primary keys in Clients dataset •

SELECT COUNT(Client ID,Invoice ID)

- returns number of unique foreign keys in Invoices dataset with Invoice ID set as connection point/primary key Difference of these two metrics is number of client_ids which do not exist in Clients dataset, but it is probably not feasible to list them. I suggest to add records to Clients dataset even for clients which do not exist in CRM yet in data warehouse so that referential integrity is not broken and update such records with information from CRM as soon as it is available. You can also add some attribute like “CRM Status” to be able to filter records which exist in CRM or do not exist in CRM easily by end users using GoodData. Other option is to load into Invoices dataset the Client ID which cannot be mapped to any Client ID in Client Dataset as some additional attribute (e.g. Unmapped Client ID). Such field would have to be created in the data warehouse via ETL/ELT.

Jakub Sterba

11/10/2022, 9:36 PM

Btw. there is a way to check if there are any referential integrity issues in workspace on GoodData Platform.

Open in Slack

Previous Next