hello I m currently struggling with a load where the source GoodData #gooddata-platform

hello, I’m currently struggling with a load where ...

Thomas Karbe

04/30/2024, 8:32 AM

hello, I’m currently struggling with a load where the source table became too big over time, so the load fails with this:

"message":"Feature flag etl.lastRecordDeduplication must be disabled for upload file size larger than %s

. Is there a way to make this work temporarily, before we can work on reducing the dataset size?

Moises Morales

04/30/2024, 8:34 AM

Hello Thomas, in general, the error is accurate in its description: The total size of your upload is too large to go through with this feature flag enabled. From there, you have two options: 1. Make sure your upload is under 60 GB of data 2. Disable the feature flag. This can be done at /gdc/projects/workspace_id/config via the gray pager or an API call. The entry looks like this:

Copy code

{
  "settingItem": {
    "key": "etl.lastRecordDeduplication",
    "value": "true",
    "source": "catalog",
    "links": {
      "self": "/gdc/projects/workspace_id/config/etl.lastRecordDeduplication"
    }
  }
}

Thomas Karbe

04/30/2024, 8:41 AM

looks like the limit is 32GiB

Thomas Karbe

04/30/2024, 8:41 AM

so we’ll go for the deduplication feature then

Thomas Karbe

04/30/2024, 8:45 AM

switching this off will still replace old data with newer data, but we don’t get a deduplication check WITHIN the new data set anymore, correct?

Thomas Karbe

04/30/2024, 8:46 AM

so as long as I make sure that everything in there is already unique, I’m fine?

Moises Morales

04/30/2024, 8:48 AM

Good question. Regarding the consequences: * true (current status) = if input data contain duplicities on key (connection point or fact table grain) they are deduplicated (last row) before being loaded. * false - if input data contain duplicities on key (connection point or fact table grain) the data load will fail with error message. This can also be found in the following documentation: https://help.gooddata.com/doc/growth/en/workspace-and-user-administration/administrat[…]ce-objects/configure-various-features-via-platform-settings/

Thomas Karbe

04/30/2024, 8:50 AM

ok, this talks specifically about input data, making me hope the the old data still gets discarded and replaced

Michal Hauzírek

04/30/2024, 12:03 PM

Yes Thomas. No matter how the

etl.lastRecordDeduplication

feature flag is set, a data load will still: • remove all old data if it is a ful lload • update the existing data based on the defined primary key in case of incremental load The only difference is really in what happens if there is a duplicity on the primary key inside the batch of data being uploaded.

Thomas Karbe

04/30/2024, 12:11 PM

thanks for clarifying that!

2 Views

Open in Slack

Previous Next