Hi Gooddata Team , We’ve been observing intermitte...
# gooddata-cloud
a
Hi Gooddata Team , We’ve been observing intermittent and inconsistent behavior on the GoodData platform in our production environment over the past 15 days. Specifically: • Some filters and visualizations work correctly initially, but after a page reload, they fail to load and throw errors. • In certain cases, visualizations display a generic "Contact administration" error message. This behavior is unpredictable and not consistent across all reports or users, which is making it difficult to isolate the root cause.
m
Hi Anjali, I can confirm I can replicate the same behaviour when impersonating the admin user in your organization and I have gone through our logs, but at this time I have not found anything conclusive yet. Could you please provide further details as to what changes were introduced to the workspace in the past 15 days that may have led to the errors? Additionally, are you aware of any changes introduced to the workspace data filters in particular? I can see in our logs that the query responsible for retrieving the label elements is failing to do so, this is usually related to how the WDF interact with it.
a
@Moises Morales thanks for the response.But we havn't made any changes the workspace.
m
Thank you for confirming. I have checked the logs and LDM once more, and it all indicates that there are some performance issues with the DB, could you please make sure you are familiar with the recommendations in our documentation here: https://www.gooddata.com/docs/cloud/connect-data/performance/? If this does not help, I will escalate the issue internally for further troubleshooting.
l
Thanks for info @Moises Morales , as per above document, our DB look good. We have other reporting working fine. Can you please escalate this? we have our production impacted because of this and its for more than 20 days now.
d
Hi Anjali and Lakhan, Moises escalated the issue to me to investigate. How precisely can you pinpoint the start of the issues? You've indicated something between 15 and 20 days. Can you establish this more precisely, please? The log messages seem to indicate that Postgres prepares a statement and then the data structures change before it is used. I know that your custom data structures keep changing all the time - but are you aware of some change in the mechanism that would change the timing of the custom data model changes? Any modification, that would take place a short time before this problem started appearing and would influence when and how your custom data model is delivered to Postgres and to GoodData? Any trace or idea that could help us look in the right places would be welcome. Many thanks.
l
Hi @Daniel Stourac This issue was reported by our CSR team on 10th April; so it might be 2-3 days older than that (because they get to know when client report to them). About awareness on some change in mechanism, I dont think we did any change in data structure in last 3 months or so. But I think you should be able to see logs at your end if any change done by anyone accidently.
👍 1
cc: @Anjali Mandowara Can you add more if you have other information
@Daniel Stourac any luck on above issue? cc: @Anjali Mandowara
d
Hi Lakhan, Not much luck, I'm afraid. My findings so far: Your colleague Pradeep Soni already raised this issue on April 14th. We have been investigating since then and communicating with him. There are two issues - probably interrelated: 1. Postgres randomly returns an error message
prepared statement "***" already exists
. This causes the filters to fail. According to internet discussions, this has something to do with connection pooling on Postgres. I couldn't find a clean solution. 2. The two visualizations seem to fail strictly after 30 seconds. The error message in our logs is
"Connection is closed"
. Some kind of timeout almost certainly. If you have access to the Postgres logs, could you please look for messages such as
too many clients
or
too many connections
? Or, perhaps, anything about
timeout
? We don't have visibility there - all we get is that Postgres has closed the connection after 30s. We can only guess about the reasons. Can you check the Postgres logs to help us find the reason, please?
l
Hi @Daniel Stourac after looking over DB logs we identified its was throwing time out some time; we made some changes in configuration and now it looks stable. Thank you for pointing to the correct rouse cause.
😊 1
🙌 2
d
Hi @Lakhan Rathore. Thank you for finding and solving the underlying issue. Hopefully, it will be smooth sailing now. Good luck!
Hi @Lakhan Rathore, one more thing. There were two issues at play: timeout (solved by you) and the message
prepared statement "***" already exists
. The latter one is related to PgBouncer and can be solved to switching it from
transaction mode
to
session mode
. You may have done it already - if not, consider this change to improve stability. We are also preparing some changes in our JDBC driver as a long-term solution.
l
Sure, thanks @Daniel Stourac