Urgent help needed with dashboards :thread:
# gooddata-ui
b
Urgent help needed with dashboards 🧵
We have some dashboards that we display to clients. These dashboards include various visualisations that are built through Analyze. Everything was working perfectly until few days ago when, while visualisations are still working, displaying them on dashboards does not.
We are on 10.17
m
Hi Byron, may I ask if the problem is happening only in the embedded enviroment or also in the GoodData enviroment?
b
In both, thanks for asking
m
Hi Byron, when you say some dashboard. Could you be so kind to send us the direct link of the dashboard for further investigation?, feel free to use DM
m
Hi Byron, I have impersonate your environment as Admin, in the dev-console I found these two errors while I was opening the dashboard.
Copy code
{
    "title": "Bad Request",
    "status": 400,
    "detail": "Cannot find label with id='attribute/acked_at/079ded41cc594b478b3e76bf04295715' in LDM objects.",
    "traceId": "3f5324552bf1e68b469cf3268663979f"
}

{
    "title": "Bad Request",
    "status": 400,
    "detail": "Cannot find label with id='attribute/simplified_contract.sign_off_date/079ded41cc594b478b3e76bf04295715' in LDM objects.",
    "traceId": "9d521fb3bfdc5180a7c98b2a26a38115"
}
Then, by checking in the LDM, both mentioned attributes have issue (warning) in the mapping tab. I tried also, to reproduce the issue, but unfortunately I am not able to see the visualisations Appraisals pending review nor Evaluated crew onboard, I would need the metrics/attributes/filters involved in order to reproduce them. Did you make some changes recently? Please, check once more the mapping section in the LDM for these both attributes
acked_at
and
sign_off_date
and let us know.
b
Hello Mauricio, thanks I will have a look. Is there a possibility we can see these detailed errors? Cause we only see calculation error or smthg like that and we can’t debug
m
Hi Byron, I found another detail in the error:
Copy code
"msg": "Failed to register cache with resultId=b33377dbf41b259455e7780b30e921ca55aea0d3",
  "exc": "errorType=com.gooddata.tiger.grpc.error.GrpcPropagatedClientException, message=Cannot find label with id='attribute/acked_at/079ded41cc594b478b3e76bf04295715' in LDM objects.,<no detail>\n\tat com.gooddata.tiger.grpc.error.ExceptionsKt.buildClientException(Exceptions.kt:37)\n\tat com.gooddata.tiger.grpc.error.ErrorPropagationKt.convertFromKnownException(ErrorPropagation.kt:254)\n\tat com.gooddata.tiger.grpc.error.ErrorPropagationKt.convertToTransferableException(ErrorPropagation.kt:220)\n\tat com.gooddata.tiger.grpc.error.ErrorPropagationKt.clientCatching(ErrorPropagation.kt:66)\n\tat com.gooddata.tiger.grpc.error.ErrorPropagationKt$clientCatching$1.invokeSuspend(ErrorPropagation.kt)\n\tat kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)\n\tat kotlinx.coroutines.internal.ScopeCoroutine.afterResume(Scopes.kt:28)\n\tat kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:99)\n\tat kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)\n\tat kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:102)\n\tat kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111)\n\tat kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:811)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:715)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:702)\n"
Saying that, did you try to Clear cache ? Also I noticed your dataset where the involved attributes are, is SQL, could you check the SQL query too?
b
Checked the SQLs, they are all working properly. Will check cache now
Cleared the cache, not working. Can you check this dashboard too?
Any help much appreciated, clients are complaining 😞
m
Hi Byron, which dashboard? same issue? I think you missed to provide the dashboard in you message.
Can we have a quick call? I’ve discovered more..
Copy code
{
  "title": "Bad Request",
  "status": 400,
  "detail": "Cannot find label with id='attribute/acked_at/079ded41cc594b478b3e76bf04295715' in LDM objects.",
  "traceId": "46a313d005634b80b641ff3e4c0dd31f"
}
This label exists!!!
Screenshot 2025-02-06 at 18.56.01.png,Screenshot 2025-02-06 at 18.55.35.png,Screenshot 2025-02-06 at 18.55.28.png
Pls treat with high urgency our clients depend on these reports. Pls reach out to arrange a call ASAP
m
Hi Byron, thank you for your patience, I have escalated your case as Urgent to our L2 Technical Support team. A colleague will contact you soon.
b
Thx @Mauricio Cabezas - here or elsewhere?
b
Hello @Byron Antoniadis, This is Branislav from the L2 Technical Support team. I am sorry for your troubles and would like to thank you for your patience while I reviewed your issue. As far as I can see, there were 2 dashboards mentioned as not loading: 1.) Crew competence 2.) Retention As far as I can see, the "Crew competence" dashboard loads OK. As for the "Retention" one, I can see the multiple errors in the browser Developer Console that look like:
Copy code
{status: 400, detail: 'An error has occurred while calculating the result', resultId: '69ef190248104ebe4e68478819a860c5e49e55f4', reason: 'General error', traceId: 'e7c0b42814816b3d8b23455bf0016d21'}
detail : "An error has occurred while calculating the result"
reason : "General error"
resultId : "69ef190248104ebe4e68478819a860c5e49e55f4"
status : 400
traceId : "e7c0b42814816b3d8b23455bf0016d21"
I am currently investigating them further. However, please note, that your LDM has the following error - "Source column(s) not found in Data Source". Could you please fix/resolve this LDM error and after that, clear the cache and let us know if it helped?
b
Looking into it now - thanks Branislav
b
You are welcome, I hope it helps... 🤞🏼 🤓
b
This was something that was just released and now fixed, it’s irrelevant to the actual problem.
@Branislav Slávik look at this visualisation - it’s working. Then if you look at the dashboard using it (Retention from above) it does not
b
@Byron Antoniadis Thank you for sharing the vizualization. Are you sure that the
Retention rate - summary per agent per rank
vizualization is used on the
Retention
dashboard? I cannot see it there. 🤔 Regardless, I have noticed an interesting behavior. The vizualization works if the
Year
filter is set to "All", but stops working when I tried to select a year, for example
2024
, which is the same as it is set on the
Retention
dashboard locked filter. I will continue the investigation on Monday during CET business hours.
b
I noticed the same - do you have any idea why this could be, so I can continue a bit now?
b
I am not 100% sure, but my idea would be the continuing with resolving the LDM warnings, especially the ones related to "alternative paths": https://www.gooddata.com/docs/cloud/model-data/evolve-your-model/many-to-many-in-ldm/#Many-to-ManyinLogicalDataModels-Alternativepaths https://university.gooddata.com/tutorials/data-modeling/logical-data-model-basic-rules-of-data-modelling/#BasicRulesofDataMod[…]pathsbetweendatasets As you can see on the screenshot, both
Created at
and
Updated at
are affected. 🤔
👀 1
b
thanks! have a good weekend!
🤞 1
b
You are welcome, have a nice weekend as well.
b
Hello @Branislav Slávik - good morning have a good week ahead. Please provide updates as you have them, this is a critical bug on our side
b
Hello @Byron Antoniadis, Thank you, I hope you'll have a great week as well. I have reached out to our developers regarding the issue and I am waiting on their update. I will inform you as soon as I hear back from them and have more details.
@Byron Antoniadis, I was asked by the developers, if it would be ok to turn off the real-time caching for the
Calypso production
data source for a while? They would like to have a further look into the issue.
b
yes, they can - thanks
b
It seems we were not far from the cause of the issue with the LDM setup. See the answer from our developers below:
The query is failing because of this condition
"__sql_30df0e4dd3aa599fd3610ae2a8742004"."year" = 2024
. Problem is that customer has a model where one dataset
promotions
has defined column
year
as STRING and another dataset with column
years.year
as INT.
promotions.year
is FK to
years.year
. So the system assumes
year
is of type INT and hence convert filter
"year" = '2024'
to
"year" = 2024
.
It is whole enforced by their sql-based dataset where they have
cpy.year::TEXT AS year
.
Compare following
```postgres=# select 1 where '11'::text = 10;
ERROR: operator does not exist: text = integer
LINE 1: select 1 where '11'::text = 10;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
postgres=# select 1 where '11' = 10;
?column?
----------
(0 rows)```
Notice that when you explicitly state that something is
TEXT
PG will not try to convert it to number. And automatic conversion from number to string is not done.
Please note, that there might be more than one query failing and more datasets affected. However, I hope that the example above help you to identify all the possible locations. As stated before, model update is the most correct and performance-wise solution at this point. Is there anything else regarding the topic we could help you with or may we consider this case as resolved?
b
Thank you @Branislav Slávik - will try to apply today and discuss again
b
@Byron Antoniadis, you are welcome. Fingers crossed. 🤞🏼🤓
@Byron Antoniadis I would like to follow up regarding the issue. Were you able to update the LDM? Did it help to resolve the issue?
b
I was at a conference so lost touch with the problem for a bit. Will update by EoD
b
No problem and/or rush, I just wanted to check back with you on the status.
Hello @Byron Antoniadis, Just a quick follow up on the issue. Were you able to get back to it and update the LDM? Has the issue been resolved? 🤞🤓
b
@Branislav Slávik thank you so much for your assistance! This was the problem, the issue has been resolved and our client is happy! 🙂
Hello @Branislav Slávik - a client that uses workspace “Meadway” noted that the Retention dashboard still doesn’t work for them. This is very weird, the individual visualizations load up under “Analyze” but when I look at the dashboard they are infinitely spinning. Please help 🙏
I failed to mention that the dashboard works for all the other workspaces
b
Hello @Byron Antoniadis, I suppose that this is not related to the issue we have discussed before. Could you please create a new thread for it? Either I or one of my colleagues will look into it as soon as possible and assist you further.
b
@Branislav Slávik the client said it was never fixed since then so it’s the same issue 🙏
can you provide an update before EoD? Thx 🙏
b
@Byron Antoniadis Thank you for sharing the information. I have done some initial investigation and so far, the issue seems to be slightly different. At least from the error:
Copy code
{
  "title": "Bad Request",
  "status": 400,
  "detail": "An error has occurred while calculating the result",
  "resultId": "e4d97221692181f4fc102e34dc770432e32fd76f",
  "reason": "Query timeout occurred",
  "traceId": "55c85b0412bf68bc65e4b28cfaa5badb"
}
That states that is has been caused by the query timeout. Is this customer somehow "different" from the others? E.g. in the amount of records or data? In addition, I reached out to the developers in order to discuss further the other possible reasons for this strange behaviour in only one of the child workspaces.
👀 1
b
Thanks @Branislav Slávik - the weird thing is that for other clients with far more data (check workspace “Laskaridis”), it succeeds. That’s why I think it’s weird
Team, do we have an update here? Our client is undergoing an audit on Monday and wants to show this
b
Hi Byron, I am in touch with our developers regarding the issue. They confirmed that the issue is different than the initial one before and that the timeout indeed happens on the Postgres side. The SQL queries take longer than 160s to finish, For now, we can only guess what causes the queries to take significant time to execute just for this particular workspace / customer. Would it be possible for you to find the problematic report queries and execute them manually in Postgres with
EXPLAIN ANALYZE
? The idea would be to compare the execution plans of the statement(s) / queries to see if there are any differences, e.g. between workspaces of different customers.
b
Hello - I have run the
EXPLAIN ANALYZE
for the
Production parent workspace
(contains ALL data for ALL clients and works!!!) and the specific failing workspace
Meadway
and the SQL statement for Meadway runs much faster & the only diff in the analyze is applying the
shipco_id
filter. I do not know where to go from here, so I’ll need your help. It’s extremely weird that the superset workspace works and the subset doesn’t
Hello - ping here 🙏
Hello - another ping, I’d like to provide an update to our client before EoW
b
Hi Byron, apologies for missing the updates here. Let me have a look into it a bit and I will get back to you before EoD.
🙌 1
@Byron Antoniadis Could you please share the queries (and their results and timing) executed with the
EXPLAIN ANALYZE
for the
Meadway
and
Parent workspace
that correspond to executions related to the Retention dashboard with us? Once shared, our developers would like to investigate them further and will compare them with the ones found in our logs to confirm and hopefully find the cause of the difference(s). Please note, that not all queries related to the dashboard report execution(s) might necessarily execute for a long time. There might only be a few of them.
b
Here you are, please go through the document and let me know if you have any qs/points for discussion
Goodmorning - wishing a great week ahead. Please lmk when we have news
FYI - I managed to change some things and fix this. It seemed that the year filter was the problem again, but only for this dataset. The worrying thing is that I couldn’t figure this out from error reports.
b
Hi @Byron Antoniadis, thank you for sharing the query details. I guess it won't be needed anymore, but it is good to have it here for future reference. I am glad that we were able to fix the issue by updating the
year
filter for this dataset. Was the fix / update similar or the same as the one recommended earlier? https://gooddataconnect.slack.com/archives/C01UR5BGAHY/p1739199906643769?thread_ts=1738832429.525059&amp;cid=C01UR5BGAHY Thank you very much for sharing your feedback with us. I will pass it to our product team for their consideration.
b
@Branislav Slávik I had changed all years to be INT, however in one model it was still an NUMERIC and I had to explicitly change it to INT. However, what boggles my mind is how was this affecting only one workspace 🤯
b
@Byron Antoniadis, unfortunately, I do not have any exact explanation for that. However, I found some forums / pages over the internet that might explain it. 🤓 It seems that the NUMERIC type is way larger (12 bytes) compared to INT (4 bytes). Also, the performance of some operations is slower: https://dba.stackexchange.com/questions/110880/numeric-vs-integer-for-a-column-size-and-performance https://stackoverflow.com/questions/58155982/what-is-the-difference-between-numeric9-0-and-int-in-postgres In this article, they even tried to measure it and it appears to be ~ 50% to 70% slower performance: https://medium.com/xendit-engineering/benchmarking-pg-numeric-integer-9c593d7af67e#:~:text=Synopsis,more%20time%20than%20integer%20types.
@Byron Antoniadis I would like to follow up, just to make sure. May I consider the issue as resolved or is there anything else regarding the matter I could help you with?