Hi everyone, I'm loading up data to a model via th...
# gooddata-platform
s
Hi everyone, I'm loading up data to a model via the REST API using these directions: https://help.gooddata.com/doc/enterprise/en/data-integration/data-preparation-and-dis[…]ion/additional-data-load-reference/loading-data-via-rest-api , everything is working fine except how do I load data incrementally? It seems everytime I do a data push, it replaces the previous data automatically? From reading this document here: https://help.gooddata.com/doc/free/en/data-integration/data-preparation-and-distribut[…]/delete-old-data-while-loading-new-data-to-a-dataset-via-api I need to specify that I need to delete old data, but that's not what I want to do and these params aren't anywhere in my manifest. How do I get it to stop replacing data and instead append to it?
m
Hi Shyam, as mentioned in this doc: https://help.gooddata.com/pages/viewpage.action?pageId=86796461 To upload data incrementally, use
"mode": "INCREMENTAL"
.
s
@Michael Ullock thanks I did that and now gooddata shows "partial upload" but my dataset is still the same as the last time I uploaded on full, ie new data hasn't been added in it's just the old one that's still there
I made sure all my models are set to incremental and sent x__timestamp in the csv
and named the file having a date timestamp dataset.userinfo-2022-08-19_23-54-00.csv for example
or do I need to load all data in full first before I use incremental to update>?
m
If you do the FULL load it will load all rows in source data - After this, if you then do INCREMENTAL load after this, it will upload only new and updated records only
s
@Michael Ullock I tried this, and it doesn't seem to update any of the new data afterwards.. ie it's still got the original 1000 records and no others even when I put new records in. Do I name the file the same, or any other differences?
so it does seem to say it's doing it, how long does parsing this take>?
I even tried updating one existing record via incremental.. nothing
p
Is this a cache thing regarding needing to flush the cache after data uploads?
s
@Paul Muharsky possibly.. how would I go about flushing it?
p
Sorry, I have no Idea! also new to the platform, just came across several mentions of needing to flush things after data uploads while browsing slack channels...
s
All I'm trying to do is basically feed it bits of data at once without removing other data. I'm just using imported models, not a manually uploaded data set but now I'm thinking maybe I need to upload a csv manually to start?
actually I'm checking https://community.gooddata.com/dashboards-and-reports-56/how-does-gooddata-cache-reports-128 and it says cache gets invalidated when I do an etl/pull request so no cache shouldn't be a factor here, plus if I load in data in FULL mode it replaces the old data right away without any cache issue
nvmd figured it out, had to use the same original filename
m
Hi @Shyam Ramani, let me try to help here. I believe there might be several different concepts mixed up here so let me try to explain it. I understand that it might be a bit confusing. First the GoodData edition - there are two major editions which differ (among other things) in how they handle the data loads: • GoodData Platform - in which you physically load the data to its internal database via API (or other tools) • GoodData.CN (Cloud Native) - in which the data is not being loaded anywhere and it works directly on top of the connected supported data source Based on the screenshots and the API you mentioned, you seem to be using the GoodData Platform. So the “Data Source Notification” article for Cloud Native does not apply to you. In fact any article which is in the Cloud Native section does not apply to you. So let’s talk about data loads to the GoodData Platform now. The API you mentioned you are using (https://help.gooddata.com/doc/enterprise/en/data-integration/data-preparation-and-dis[…]ion/additional-data-load-reference/loading-data-via-rest-api) is a low-level API to load data to a specific workspace and it does load the data either fully or incrementally for you, but it does not determine the increment. This low level API does not use the x__timestamp column. If you want to incrementally load only the data which has changed, you should send only the new/modified records in the CSV file. From the screenshot you shared, it seems you are now really loading incrementally (=upsert) but the data size uploaded remained the same 371.7 kB. So I assume you told the system to upload incrementally, but provided ALL the data (not just the increment). As I said this API is NOT using the x__timestamp column. The x__timestamp column (or in case of flat files the naming convention of the file with timestamp in its name) is a feature of another tool for uploading data - the Automated Data Distribution which is higher-level and internally uses the same API, but apart from it, it provides service for automatically handling the increments, load data to multiple workspaces etc. So depending on what you want to achieve and works best for you, I would recommend to either: • keep using the API to upload the data, but for the incremental load make sure you only provide the new and modified records in your CSV file in case of incremental load (and feel free to remove the x__timestamp column as this API is not using it) (make sure you have a key defined in your dataset for incremental load to avoid duplicities) • OR explore the possibilities of the Automated Data distribution and use the x__timestamp or timestamp in the filename to handle the increments
s
got it, figured it out in the end