Hi friends, I have yet to understand what causes m...
# gooddata-cloud
d
Hi friends, I have yet to understand what causes my filters to show up red like this ("Error when loading filter values". Refreshing multiple times always solves the issue, but it's keeping us from launching our latest product. I can't recreate it by clearing the cache nor switching browsers nor switching accounts, but it reliably happens in the morning/ the first time i use the system in day
m
Hi Daniel, can you please clarify for me if you’re seeing this issue today? We actually released a fix a few days ago to fix a very similar issue. Can you please confirm if you saw this error today?
d
yes, alas, this screenshot was taken about 20 minutes ago
m
Thanks for the confirmation - I was trying to replicate the issue in my testing workspace, but I was unable to replicate it. So in this case, I would like to ask to try and replicate the issue tomorrow morning once again, or at your earlier convenience and provide us the with related Error ID / Trace ID from your browsers Dev Tools so we can track the issue down in our logs and see if it’s connected to the issue I previously mentioned or if this is a different issue.
d
DMing the HAR to you
m
Thank you for sending me the HAR file - We are reviewing this for you and will get back to you with more details as soon as possible.
Hi Daniel, to assist our investigation can you please tell if multiple users see this issue, or it’s only your user seeing this error? Can you please also clarify the flow - Does this issue happen daily after uploading new data to your workspace? Or this happens sporadically and not only first thing in the morning when you login? And could you DM a direct link to the dashboard please so our team can review it, if needed. Thanks in advance for this!
d
1. everybody sees this 2. daily in the morning. Not only mornings though, seems to be more like anytime it hasn't been opened for several hours. And it is increasingly paired with visualizations not being displayed -- this is also fixable just by refreshing multiple times. It's not reproducible just by clearing cache, surprisingly. 3. will DM a dashboard link
m
Sorry about this, but I forgot to ask one more thing previously… Can you please tell me how long you been seeing this issue for? If you could tell us some timeframe of when you started seeing this issue, this would be very helpful. Also, I just wanted to let you know that our team are looking into this issue internally and we will hopefully get back to you with more concrete details on what’s causing this issue soon. Thank you once again for this additional information!
I would just like to add that I went ahead and accessed your workspace and used Impersonation on our internal Admin user and I was able to replicate the issue on my side
1
d
This has been ongoing since we opened the account
j
Hello Daniel, I will be assisting you with your case as L2 engineer. I will review your case and come back to you.
d
Thanks Jakub
j
Hello Daniel, Just wanted to give you current feedback, I've identified the problem when searching an invalid or non-existing filter value resulting in 500 error. I am currently discussing it internally as we may have a fix for it. I would contact you back soon to let you know about outcome of our internal discussion. Let's stay tuned 🙂
Hello again Daniel, another follow up: I've discussed your issue internally, it comes out that we might be dealing with some bug related to JDBC driver for Amazon RedShift. Error in the filter display is related to failed execution of SQL statement. We were able to replicate it and currently we are working on a fix, I will inform you about any development of this case. Could you please tell me what is the business impact caused by this issue?
and one more question, I've noticed that query execution fails when below query is executed:
Copy code
SELECT "app_sys_apps_control_list0"."app" AS "a_label_apk_app_selection_app_system_2bb0c93f87e411", "t1"."m_f567ed6a0961b25176acfb49f8490b11" AS "m_1"
FROM (SELECT "apk" AS "a_label_apk_app_system_12bd071c48e760", COUNT("apk") AS "m_f567ed6a0961b25176acfb49f8490b11", TRUE AS "def_m_f567ed6a0961b25176acfb49f8490b11"
FROM "local_schema_rs_gd_v10_cluster3"."app_sys_apps_control_list"
GROUP BY "apk") AS "t1"
INNER JOIN "local_schema_rs_gd_v10_cluster3"."app_sys_apps_control_list" AS "app_sys_apps_control_list0" ON "t1"."a_label_apk_app_system_12bd071c48e760" = "app_sys_apps_control_list0"."apk"
WHERE "t1"."def_m_f567ed6a0961b25176acfb49f8490b11"
ORDER BY "app_sys_apps_control_list0"."app" NULLS FIRST
LIMIT 10000
Could you please test that query against your data in your environment?
d
Will do
hi, confirming that this query actually works within 2 seconds when applied directly to my db
j
Hello Daniel, Thank you for confirmation. It seems that "Error when loading filter values" appears when filter value is being edited in the filter field - that executes two actions 'collectLabelElements' which then generates a 'SELECT' query. if the 'Select' query executes for too long it fails into error on jdbc driver level: "No results were returned by the query". JDBC driver results in error 'Cannot put an empty flight" which means that driver is trying to add a
Flight
object to a collection or data structure, but the
Flight
object being added is either null or lacks the necessary data. It is unexpected behaviour and it is isolated only to RedShift datasources. We will further investigate and let you know about the outcome.
Hello Daniel, Coming back to our investigation. We would have some additional questions: 1. Following instructions from https://repost.aws/knowledge-center/redshift-query-abort, could you please verify ABORT, CANCEL, or TERMINATE requests for date frame from 26th of July until today. 2. Also, due that logs are stating “ProcessError(errorType=SERVER_EXCEPTION, message=No connection to '5e980bdc-1165-422f-8036-4bee8a5106ea' available.)” - we see that connection to the server was interrupted several times (we can’t identify on which side it was interrupted), it happens intermittently - it could be due to firewall configuration which stops some particular connections when discovered a malicious content of network packets (that is how stateful firewalls are behaving). It might also related to poor connection between GoodData Cloud and your RedShift database. 3. When the issue occurred first time? Where there any changes done your side at that time? 4. What is the quota for maximum concurrent connections to the database and their timeout set on RedShift? Last but not least I would like to specify urgency for this incident with you. Please let me know how impactful is this problem for your business. Kind regards, Jakub
d
Thanks Jakub! Interesting 1. This is a serverless set up, so that table doesn't exist, as i understand it.(https://repost.aws/questions/QUr1ywNL6kQPOglpmRVMCtCA/redshift-superuser-permission-denied-to-stl-tables). Not sure how to get a comparable list 2. Firewall configuration on our end is typical. So far, Firewall configuration hasn't been a problem with alternative endpoints besides GoodData 3. This has occurred continuously since using redshift. 4. Unlimited. This is not the issue 5. This is directly preventing us from making any new revenue. Our product is an external-facing dashboard. We can't sell new subscriptions until this is resolved.
j
Hello Daniel,
Thank you for your reply. We have increased the urgency of your incident and engaged additional resources in investigation. Please accept our apologies but due to complexity of the issue. Just to give you a short overview of our investigation: • as previously, also today I've impersonated to your affected workspace - however I was unlucky to reproduce the problem probably due that data for filtered value was already precached • we investigated jdbc driver logs in context of current driver version to verify if there isn't any incompatibility • we created a test workspace with RedShift datasource and dummy database to replicate the error - we did not get same error. • we analyzed "SELECT" query which is executed when there is input in filter field and compared with ones we executed while impersonating and also in our separate test environment We will much appreciate if you could provide a new HAR file capturing inputs in the filter field, if I will be able to replicate the issue later on today with second impersonation attempt then I will let you know that HAR file is not necessary anymore. Also I'm preparing additional investigation on your serverless RedShift to check in Query Editor reason of SQL query failure. Best regards, Jakub
m
Hello Daniel, I hope you're doing well. We were able to reproduce the issue you’re experiencing, and it appears to be specific to the serverless version of Redshift. It seems that after a period of inactivity, the resources behind the database may be unassigned, which leads to errors on the first request. However, once the database "wakes up," it functions without issues. We are reaching out to AWS Support regarding this, because it doesn't seem the problem is on our side, and you are welcome to do the same. In the meantime, I recommend using the standard version of Redshift or another supported database as a workaround. Additionally, you could try sending periodic simple requests to the database to keep it active and avoid the inactivity issue. We apologize for the inconvenience and will update you soon on the status of the serverless Redshift.
d
Thanks all --- very disappointing result, but i appreciate the legwork. We've arrived at the same conclusion. Redshift serverless (any serverless setup) will always have a wakeup time. As we'd like to continue working with serverless, it would be ideal if GoodData had "is this serverless?" toggle on its data source settings menu. If serverless = TRUE, then the connection should be given extra time to load before GoodData perceives the lack of response as a timeout error
m
Hi Daniel, thank you for your response. We have already opened a discussion with AWS and hopefully they will give us some recommendation soon (maybe something similar you mentioned) or will identify where is the problem. I will keep you informed.
d
I have a ticket with them as well, we'll see
m
Great!
Hi Daniel, we are in active discussion with AWS, we have provided a reproducer and they escalated to their Redshift Services team. I will keep you posted.
d
I appreciate it
m
Hi Daniel, it is still with AWS. We provided all information and waiting for fix / recommendation. It seems that cold starts are known issue for Redshift Serverless. I will keep you informed.
d
Thanks Martin
m
Daniel, I've got and update from AWS regarding the issue.
It seems to be a type situation related to a bug with Serverless Redshift when connecting via JDBC driver only since it is not happening from QueryEditorV2 or ODBC.
And they are still investigating the case. I will keep you informed.
d
Thanks for keeping this alive Martin. --- I am surprised that they say the issue isn't present in the query editor, as I do experience obvious lag times very the wuery editor.
m
Hello @Daniel Muise I apologize for the long radio silence. We have been working with AWS on resolving the issue. They acknowledged that this issue had occurred in the past, but it should be resolved with a new driver, which we have already installed. We see an improvement, however, we are still occasionally able to reproduce the issue. To mitigate the impact, we have decided to implement a retry mechanism to avoid the error. Additionally, we provided AWS with access to our demo instance where we were able to reproduce the issue, and they are still investigating it. I believe you are also experiencing the issue less frequently now, requiring fewer manual refreshes. Hopefully, the problem will be fully resolved either by our retry implementation specifically targeting AWS Redshift Serverless, or eventually by AWS itself.
Hello @Daniel Muise, AWS has concluded that the issue is a limitation of AWS Redshift Serverless, not a bug. We are planning to implement a retry mechanism on our side. While this may slightly prolong the initial computation of reports when Redshift Serverless is waking from sleep, it will prevent failures. This enhancement is expected to be implemented within the upcoming month.
111 Views