Hi GD Team, I started receiving an error message i...
# gooddata-platform
w
Hi GD Team, I started receiving an error message in my ADD flows starting yesterday 3/2/25. My ADD flows ran successfully for 100 days prior to that. I have made no configuration or data changes. Seeing that this issue started on a Sunday and the issue is across 4 different workspaces , it is VERY unlikely that it is related to user entered data. I have checked for software releases on Good Data and on GBQ and I can't find anything that seems to be related. Can you help?
j
Hi Willie, This usually happens due to the size of the particular data load when BigQuery closes the connection. Are you using Full or incremental loads here?
There are a few different work arounds for this: There are few possible ways how to deal with the issue: 1. Switching the load mode to incremental, loading the data in smaller chunks. 2. Setting a different ETL schedule for the problematic table 3. Setting the parameter
GDC_DOWNLOAD_PROJECTS_CHUNK_SIZE
to 1 which will enable each task to only download data of 1 project at a time (instead of proceeding in chunks of 4). 2 or 3 may work as well. You can find more information in the bellow documentation: https://help.gooddata.com/doc/enterprise/en/data-integration/data-preparation-and-distribution/direct-data-distribution-from-data-warehouses-and-object-storage-services/automated-data-distribution-v2-for-data-warehouses/set-up-automated-data-distribution-v2-for-data-warehouses/#SetUpAutomatedDataDistributionv2forDataWarehouses-BestPractices
w
I don't think this is the cause of the problem. This all started on Sunday. And there would have been no difference in data volume from Saturday to Sunday. And I load this data in two different ways ... 1) where each customer has its own workspace and 2) where we load all customers into the 1 workspace (for global analytics). Both loads failed with the same error on the same day and now won't load. In scenario 1, 4 of the workspaces don't load out of 100s of workspaces. In scenario 2, the load won't finish at all. The load for Scenario 2 has failed 27 times since yesterday (being retried in 30 minute intervals). Something changed yesterday.
j
Are you running full loads or incremental loads?
w
We run full loads every day. And incremental throughout the day. We have been running full loads every day for the last year
The table its failing on is one of the smallest of the 19 tables we load
For instance, the first workspace ID in the list that failed for this table "gd_budgets" only has 425 rows in it.
Also, the lion's share of our workspaces are TINY. I very highly doubt we are reaching the limits of your software
I have re-run these loads a few times and they are failing in the same exact spots with the same error every time. If this was load based, I doubt it would behave that way
j
A full load involves transferring all data from the source system to GoodData, regardless of whether it has changed since the last load. This means that every time a full load is executed, the entire dataset is refreshed in GoodData. This can be time-consuming and resource-intensive, especially for large datasets, as it requires reprocessing all data. Running a full load from BigQuery is using all of the resources, and this is when you would hit those limits.
Nevertheless, if you really believe this is not the case, I will open a ticket for you. We can access the workspace and review the etl
w
What is a "Large" dataset to you?
If my situation is "large", then I think there are much bigger problems. I have relatively small workspaces.
And do you think I am hitting GBQ limits or GoodData limits?
j
BigQuery. I will open the ticket and we will continue the investigation there.
w
OK, thank you
m
I have not seen such error before, but just from the error message on the screenshot it sounds to me that: • the error is propagated from Google BigQuery • the issue is with the query that GoodData load is executing • it complaints about conversion of non-finite floating point number (so infinity?) • it happens during processing od table
gd_budgets
The query typically contains of the select based on the mapping, potentially WHERE for client_id discriminator and in case of incremental load also condition for incremental date (which is not the case as you say). So I assume it would be in the data itself. I would recommend to check if there is not any
infinity
or some other "weird" value in the data of this table. And if it is, convert it to something else.
w
Hi Michal! Long time since we talked. Your comment made me dig deeper into your suspicion. We kept trying to query in GBQ for this table and the projects that were failing and we couldn't find anything. And then we realized that GoodData must be be using a CAST when it pulls the data from GBQ. As soon as we used CAST to Numeric around the values of this table we were able to isolate the problem. There was one record in the budgets table amount column that was actually the value NaN. That column in postgres was a numeric(10,2). I had NO IDEA that a database would actually store a NaN in a numeric field, but it does. And GBQ does as well. So it wasn't until you tried to CAST it that it caused a problem which is why it made it so hard to find. We fixed the 1 record in the source data and we will be patching our application to not allow this value anymore. Thanks everyone for your help.
🙌 1
👍 1
1