Hi and happy new year to everyone! I come with a n...
# gooddata-platform
Hi and happy new year to everyone! I come with a new question, as always 😉 We are experiencing some data loading issues, which share 3 properties: 1. They don’t seem to be deterministic. They appear, and then the next load is just working fine again. 2. The error messages are not helping us to understand more about what’s happening. 3. We didn’t change things on our end, so the cause of those errors is quite puzzling to us. Here some examples:
Copy code
2024-01-05T17:55:18.095+0100 [ERROR]: Data distribution worker failed. Reason: Task finished in status ERROR error_id=d07fe815-1ffd-44aa-aded-3a35f1cf6ccc error_code=msf.cloudresource.dataload.model.mapping.task.error
Copy code
====================== Downloading and integrating data ======================

2024-01-05T18:40:19.758+0100 [ERROR]: Fail to load projects "[ox3jebjo7n512ue1ci68tyjk9xc169tu]". Reason: Error processing taskId=45e1bb11f04e266e3e4c482d5a067fb100000035 status=ERROR messages=TaskMessages:{empty=true,messageBodies=[],messages=[]}
2024-01-05T18:40:29.766+0100 [INFO]: 

====================== End of downloading and integrating data ======================

2024-01-05T18:40:29.767+0100 [ERROR]: Data distribution worker failed. Reason: All projects failed to load.
Hi Thomas, thanks for sending this in and for the extra details! I did locate the event on our logs, and while there were some additional entries there, I’m afraid there weren’t much use; While we investigate further, can you tell us a little more about your setup here? For example: • What Data Source are you using? (e.g., RedShift, Snowflake, etc) • When did this issue start occurring? The earliest I found on our logs was January 3rd; Were there any changes made on your data source at this point? ◦ We’ve had previous cases were similar issues were caused by a connectivity configuration change on Redshift, for example. Thank you!
Its a Redshift data source, and we have this for a few months already
Hi Thomas, I’ve been checking the logs but I couldn’t find much more details. Reason for failures looks to be different in the two examples. The first one is more sporadic than the other. I’m asking our engineers to shed some light to it. Do you have an example of some recent failure?
Hi Jan, thanks for looking into that. I have one from yesterday. What’s the best way to share?
A request ID of the dataload should be be enough
here you go:
Hi Thomas, my apologies for the delay getting back to you.. we’ve made some adjustments on underlying infrastructure during maintenance window on Feb 3rd which we believe it should help with random failures mentioned above^. Do you still see some random failures?
actually, no that seems to have improved indeed! thanks Jan, amazing±!