I have a strange question. I am trying to build so...
# gooddata-platform
w
I have a strange question. I am trying to build some automation to check that the data loaded into Good Data matches the data from our source system. The data from our source system in Postgres moves to another system (GBQ) for staging and Good Data pulls it from there. There are several transformations on the data through the process too. So I want to make sure the final numbers generated in Good Data match the numbers generated from our source system. I am able to do it manually by building analysis reports in good data and then then writing an aggregate query against our source system and then manually verifying the two. (they matched ... phew). But I want to write a nightly job that does this checking automatically. Is there a way to get the CSV results output from an analysis report through an API (like grey pages). That way I could write a script that runs queries against our source system and then calls a GD API and then compare it to the CSV output from Good Data. Is this possible? Or are there any other ways to accomplish what I am trying to accomplish?
t
Hi, I can also give some high level view on how we tackle this problem: When loading to GoodData, only a few things could actually go wrong, that come to my mind: • the load fails completely -> you’d get an error message • data type conversion gets messed up somehow from the source system to GD • the dataset uses a field as primary key that is not unique, and therefore you loose some data That’s the things you’d want to check against in GoodData directly. All other things, you want to have checked BEFORE loading to GoodData. We basically split our warehouse into 3 sections: a) the section where raw data comes in. not cleaned, not processed in any way b) the section that has the transformed / processed data that we later want to report on c) a section that exactly reflects the data as it is supposed to land in GoodData You want to do all your data consistency checks in those three sections. In a) check for proper ingestion, in b) check for proper transformation, in c) check for the correct aggregation and representation.
👍 1
Our tool of choice for doing data validation is “great expectations”, which you can get as a free python package and use in your pipelines
m
Let me add one more tip to what Thomas mentioned - if you want to make sure you did not make any mistake in the keys, you can switch how GoodData behaves during the load. Normally if there is a duplicity on the key in the data load, GoodData silently deduplicates the data (the last row wins) and the load does not fail. That can be tricky to debug. If you prefer the load to fail completely in this case, you can switch the parameter
etl.lastRecordDeduplication
to
false
. In that case you receive an error in case there is a duplicity on key during the load. In case of incremental load it only fails if within the data being loaded is a duplicity. You can still overwrite existing (already loaded) data with a new value with the same key.
And to your original question Willie - Yes, it is possible to execute/export data from GoodData Platform insight via API, however I am not sure if it is currently documented end to end. Here it is mentioned in the API reference page: https://help.gooddata.com/doc/enterprise/en/expand-your-gooddata-platform/api-reference/#section/Use-Cases/Export-an-insight The second step is calling
exportResult
API with path from the previous call of
executeAfm
API but I am not sure if the first step - executeAfm API is documented anywhere and especially if it is documented how to use it to call a specific existing insight 😕. On the other hand, if you do not want to build the queries dynamically and only want to repeatedly call the same query, you might try building this insight in the UI and then capturing the proper executeAfm API call in the browser console. It will be something in line with:
POST /gdc/app/projects/{PROJECT_ID}/executeAfm
with payload like (depending on how exactly your :
Copy code
{
  "execution": {
    "afm": {
      "attributes": [
        {
          "displayForm": {
            "uri": "/gdc/md/k9camwhstjud8i7di51o8llcl1by2byz/obj/979"
          },
          "localIdentifier": "b76d894ec377447d94f96a63591a88dc"
        }
      ],
      "filters": [
        {
          "relativeDateFilter": {
            "dataSet": {
              "uri": "/gdc/md/k9camwhstjud8i7di51o8llcl1by2byz/obj/684"
            },
            "granularity": "GDC.time.month",
            "from": 0,
            "to": 0
          }
        }
      ]
    },
    "resultSpec": {
      "sorts": [
        {
          "attributeSortItem": {
            "attributeIdentifier": "b76d894ec377447d94f96a63591a88dc",
            "direction": "asc"
          }
        }
      ],
      "dimensions": [
        {
          "itemIdentifiers": [
            "b76d894ec377447d94f96a63591a88dc"
          ]
        },
        {
          "itemIdentifiers": []
        }
      ]
    }
  }
}
or another approach might be instead of calling the APIs directly use the GoodData.UI framework to get the data.
w
Thank you so much, I am going to try this approach