--- Also, secondary question: If I truncate my tab...
# gooddata-platform
p
--- Also, secondary question: If I truncate my tables every day, refill them with an x__timestamp = NOW() and I only use one incremental load per day in GD, would that be equivalent to a daily full load? 🤔
âś… 1
m
This would not be equivalent to a full load because in full load all the data currently in the dataset are replaced with what is newly loaded there. If you truncate whole table, and insert rows with x__timestamp=NOW() it will perform a MERGE (insert+update) based on the keys in each dataset. In case some keys were in your tables before and now they are removed, they will still remain in the workspace. Also MERGING a lot of data into a lot of data (like in case you would be re-uploading everything in incremental mode) can be quite slow. Increments work best with the increment being relatively smaller than what is already loaded.
p
OK so due to incremental load not executing delete on records which are no longer in source data, incremental load differs from full load. So... that means that in the case where a client extraction failed, I cannot provide them with the previous day's data until I run a subsequent incremental load job because there is a risk that some entries have been deleted between the two runs and that those will stay in the workspace after the incremental (refresh) load job. 🤔 Is this a correct interpretation? If so, would that mean that the only way to provide some fallback (stale) data until the next load job is to do a full load job once again after the data has been extracted?
(an alternate way would be to track deletion in our ELT and have a filter pretty much everywhere downstream, but that's a can of worms I don't want to open)
--- So my initial question is even more important then: Can I trigger full load via an API call rather than via a schedule?
Especially if we end up with an issue internally and for instance our ELT throttles, I want to avoid having a full load done on potentially inconsistent data
@Michal HauzĂ­rek Is that what I am looking for? https://help.gooddata.com/doc/enterprise/en/expand-your-gooddata-platform/api-reference#operation/executeSchedule If so, in the description it says:
Depending on whether you are executing an Automated Data Distribution (ADD) schedule
Does that mean I need to have a schedule (recurring) load job to be able to call this endpoint? Or can I just have a connection/data source configured and from there I can call the API Endpoint whenever I want to tell GD to schedule a load?
m
Yes this API is the one you can use to call load to GoodData workspace. And yes, it also supports forcing a full load based on the parameters you send. The “schedule” in GoodData is basically a (data loading) process associated with some parameters. It does not need to be recurring (you can have a schedule which is set to run “manually”). Schedules also hold the history of their execution and for some time also the log from its run. And yes, schedule is the recommended way how to interact with the loads - both manually in the Data Integration Console or via API. If you’ve loaded data to your workspace, you probably already have such schedule. Each (data loading) schedule needs to exist in some workspace. connection/datasource is one level above - it is really just a definition of a connection string and credentials. It does not say into which workspace, which datasets etc. to load the data to. There are two ways how to work with the schedules: 1. schedule can either exists within the workspace into which you are loading the data (that is the “current workspace” option) 2. or if you are using the Lifecycle Management to handle many workspaces of the same structure (and different client_ids) there can be one schedule in one workspace which handles loads to all the workspaces within the Lifecycle Management Segment. In such setup we usually recommend to have one special “service” workspace for this purpose which does not have any client_id or even any data model but serves just as an envelope for the data loading (and other) processes.
âś… 1
p
Lovely. That's some quality answer @Michal Hauzírek. Thanks a lot for your thoroughness. I'd suggest your answer be recycled and added in the docs of the API ref 🙂
🙏 1