Urgent help needed with dashboards thread GoodData #gooddata-ui

Join Slack

Urgent help needed with dashboards :thread:

# gooddata-ui

Byron Antoniadis

02/06/2025, 9:00 AM

Urgent help needed with dashboards 🧵

Byron Antoniadis

02/06/2025, 9:01 AM

We have some dashboards that we display to clients. These dashboards include various visualisations that are built through Analyze. Everything was working perfectly until few days ago when, while visualisations are still working, displaying them on dashboards does not.

Byron Antoniadis

02/06/2025, 9:02 AM

We are on 10.17

Mauricio Cabezas

02/06/2025, 10:34 AM

Hi Byron, may I ask if the problem is happening only in the embedded enviroment or also in the GoodData enviroment?

Byron Antoniadis

02/06/2025, 12:00 PM

In both, thanks for asking

Mauricio Cabezas

02/06/2025, 1:33 PM

Hi Byron, when you say some dashboard. Could you be so kind to send us the direct link of the dashboard for further investigation?, feel free to use DM

Byron Antoniadis

02/06/2025, 2:23 PM

https://seafair.cloud.gooddata.com/dashboards/#/workspace/079ded41cc594b478b3e76bf04295715/dashboard/59808c8f-9b7c-46b7-ab6b-44bab5af7c3c

Mauricio Cabezas

02/06/2025, 3:00 PM

Hi Byron, I have impersonate your environment as Admin, in the dev-console I found these two errors while I was opening the dashboard.

Copy code

{
    "title": "Bad Request",
    "status": 400,
    "detail": "Cannot find label with id='attribute/acked_at/079ded41cc594b478b3e76bf04295715' in LDM objects.",
    "traceId": "3f5324552bf1e68b469cf3268663979f"
}

{
    "title": "Bad Request",
    "status": 400,
    "detail": "Cannot find label with id='attribute/simplified_contract.sign_off_date/079ded41cc594b478b3e76bf04295715' in LDM objects.",
    "traceId": "9d521fb3bfdc5180a7c98b2a26a38115"
}

Then, by checking in the LDM, both mentioned attributes have issue (warning) in the mapping tab. I tried also, to reproduce the issue, but unfortunately I am not able to see the visualisations Appraisals pending review nor Evaluated crew onboard, I would need the metrics/attributes/filters involved in order to reproduce them. Did you make some changes recently? Please, check once more the mapping section in the LDM for these both attributes

acked_at

and

sign_off_date

and let us know.

Byron Antoniadis

02/06/2025, 3:14 PM

Hello Mauricio, thanks I will have a look. Is there a possibility we can see these detailed errors? Cause we only see calculation error or smthg like that and we can’t debug

Mauricio Cabezas

02/06/2025, 3:47 PM

Hi Byron, I found another detail in the error:

Copy code

"msg": "Failed to register cache with resultId=b33377dbf41b259455e7780b30e921ca55aea0d3",
  "exc": "errorType=com.gooddata.tiger.grpc.error.GrpcPropagatedClientException, message=Cannot find label with id='attribute/acked_at/079ded41cc594b478b3e76bf04295715' in LDM objects.,<no detail>\n\tat com.gooddata.tiger.grpc.error.ExceptionsKt.buildClientException(Exceptions.kt:37)\n\tat com.gooddata.tiger.grpc.error.ErrorPropagationKt.convertFromKnownException(ErrorPropagation.kt:254)\n\tat com.gooddata.tiger.grpc.error.ErrorPropagationKt.convertToTransferableException(ErrorPropagation.kt:220)\n\tat com.gooddata.tiger.grpc.error.ErrorPropagationKt.clientCatching(ErrorPropagation.kt:66)\n\tat com.gooddata.tiger.grpc.error.ErrorPropagationKt$clientCatching$1.invokeSuspend(ErrorPropagation.kt)\n\tat kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)\n\tat kotlinx.coroutines.internal.ScopeCoroutine.afterResume(Scopes.kt:28)\n\tat kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:99)\n\tat kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)\n\tat kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:102)\n\tat kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111)\n\tat kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:811)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:715)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:702)\n"

Saying that, did you try to Clear cache ? Also I noticed your dataset where the involved attributes are, is SQL, could you check the SQL query too?

Byron Antoniadis

02/06/2025, 4:03 PM

Checked the SQLs, they are all working properly. Will check cache now

Byron Antoniadis

02/06/2025, 4:05 PM

Cleared the cache, not working. Can you check this dashboard too?

Byron Antoniadis

02/06/2025, 4:15 PM

Any help much appreciated, clients are complaining 😞

Mauricio Cabezas

02/06/2025, 4:28 PM

Hi Byron, which dashboard? same issue? I think you missed to provide the dashboard in you message.

Byron Antoniadis

02/06/2025, 4:30 PM

Right, this one: https://seafair.cloud.gooddata.com/dashboards/#/workspace/079ded41cc594b478b3e76bf04295715/dashboard/eb706ca8-9796-458c-a259-b1aa60dc0753

Byron Antoniadis

02/06/2025, 4:42 PM

Can we have a quick call? I’ve discovered more..

Byron Antoniadis

02/06/2025, 4:55 PM

Copy code

{
  "title": "Bad Request",
  "status": 400,
  "detail": "Cannot find label with id='attribute/acked_at/079ded41cc594b478b3e76bf04295715' in LDM objects.",
  "traceId": "46a313d005634b80b641ff3e4c0dd31f"
}

This label exists!!!

Byron Antoniadis

02/06/2025, 4:56 PM

Screenshot 2025-02-06 at 18.56.01.png,Screenshot 2025-02-06 at 18.55.35.png,Screenshot 2025-02-06 at 18.55.28.png

Byron Antoniadis

02/06/2025, 4:58 PM

Pls treat with high urgency our clients depend on these reports. Pls reach out to arrange a call ASAP

Mauricio Cabezas

02/07/2025, 9:18 AM

Hi Byron, thank you for your patience, I have escalated your case as Urgent to our L2 Technical Support team. A colleague will contact you soon.

Byron Antoniadis

02/07/2025, 11:27 AM

Thx @Mauricio Cabezas - here or elsewhere?

Branislav Slávik

02/07/2025, 12:59 PM

Hello @Byron Antoniadis, This is Branislav from the L2 Technical Support team. I am sorry for your troubles and would like to thank you for your patience while I reviewed your issue. As far as I can see, there were 2 dashboards mentioned as not loading: 1.) Crew competence 2.) Retention As far as I can see, the "Crew competence" dashboard loads OK. As for the "Retention" one, I can see the multiple errors in the browser Developer Console that look like:

Copy code

{status: 400, detail: 'An error has occurred while calculating the result', resultId: '69ef190248104ebe4e68478819a860c5e49e55f4', reason: 'General error', traceId: 'e7c0b42814816b3d8b23455bf0016d21'}
detail : "An error has occurred while calculating the result"
reason : "General error"
resultId : "69ef190248104ebe4e68478819a860c5e49e55f4"
status : 400
traceId : "e7c0b42814816b3d8b23455bf0016d21"

I am currently investigating them further. However, please note, that your LDM has the following error - "Source column(s) not found in Data Source". Could you please fix/resolve this LDM error and after that, clear the cache and let us know if it helped?

Byron Antoniadis

02/07/2025, 1:17 PM

Looking into it now - thanks Branislav

Branislav Slávik

02/07/2025, 1:20 PM

You are welcome, I hope it helps... 🤞🏼 🤓

Byron Antoniadis

02/07/2025, 2:07 PM

This was something that was just released and now fixed, it’s irrelevant to the actual problem.

Byron Antoniadis

02/07/2025, 4:08 PM

@Branislav Slávik look at this visualisation - it’s working. Then if you look at the dashboard using it (Retention from above) it does not

Branislav Slávik

02/07/2025, 4:48 PM

@Byron Antoniadis Thank you for sharing the vizualization. Are you sure that the

Retention rate - summary per agent per rank

vizualization is used on the

Retention

dashboard? I cannot see it there. 🤔 Regardless, I have noticed an interesting behavior. The vizualization works if the

Year

filter is set to "All", but stops working when I tried to select a year, for example

, which is the same as it is set on the

Retention

dashboard locked filter. I will continue the investigation on Monday during CET business hours.

Byron Antoniadis

02/07/2025, 4:48 PM

I noticed the same - do you have any idea why this could be, so I can continue a bit now?

Branislav Slávik

02/07/2025, 5:01 PM

I am not 100% sure, but my idea would be the continuing with resolving the LDM warnings, especially the ones related to "alternative paths": https://www.gooddata.com/docs/cloud/model-data/evolve-your-model/many-to-many-in-ldm/#Many-to-ManyinLogicalDataModels-Alternativepaths https://university.gooddata.com/tutorials/data-modeling/logical-data-model-basic-rules-of-data-modelling/#BasicRulesofDataMod[…]pathsbetweendatasets As you can see on the screenshot, both

Created at

and

Updated at

are affected. 🤔

👀 1

Byron Antoniadis

02/07/2025, 5:07 PM

thanks! have a good weekend!

🤞 1

Branislav Slávik

02/07/2025, 5:15 PM

You are welcome, have a nice weekend as well.

Byron Antoniadis

02/10/2025, 9:31 AM

Hello @Branislav Slávik - good morning have a good week ahead. Please provide updates as you have them, this is a critical bug on our side

Branislav Slávik

02/10/2025, 12:15 PM

Hello @Byron Antoniadis, Thank you, I hope you'll have a great week as well. I have reached out to our developers regarding the issue and I am waiting on their update. I will inform you as soon as I hear back from them and have more details.

Branislav Slávik

02/10/2025, 12:48 PM

@Byron Antoniadis, I was asked by the developers, if it would be ok to turn off the real-time caching for the

Calypso production

data source for a while? They would like to have a further look into the issue.

Byron Antoniadis

02/10/2025, 1:35 PM

yes, they can - thanks

Branislav Slávik

02/10/2025, 3:05 PM

It seems we were not far from the cause of the issue with the LDM setup. See the answer from our developers below:

The query is failing because of this condition
"__sql_30df0e4dd3aa599fd3610ae2a8742004"."year" = 2024
. Problem is that customer has a model where one dataset
promotions
has defined column
year
as STRING and another dataset with column
years.year
as INT.
promotions.year
is FK to
years.year
. So the system assumes
year
is of type INT and hence convert filter
"year" = '2024'
to
"year" = 2024
.

It is whole enforced by their sql-based dataset where they have
cpy.year::TEXT AS year
.

Compare following

```postgres=# select 1 where '11'::text = 10;

ERROR: operator does not exist: text = integer

LINE 1: select 1 where '11'::text = 10;

^

HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

postgres=# select 1 where '11' = 10;

?column?

----------

(0 rows)```

Notice that when you explicitly state that something is
TEXT
PG will not try to convert it to number. And automatic conversion from number to string is not done.

Please note, that there might be more than one query failing and more datasets affected. However, I hope that the example above help you to identify all the possible locations. As stated before, model update is the most correct and performance-wise solution at this point. Is there anything else regarding the topic we could help you with or may we consider this case as resolved?

Byron Antoniadis

02/11/2025, 9:23 AM

Thank you @Branislav Slávik - will try to apply today and discuss again

Branislav Slávik

02/11/2025, 9:51 AM

@Byron Antoniadis, you are welcome. Fingers crossed. 🤞🏼🤓

Branislav Slávik

02/13/2025, 9:40 AM

@Byron Antoniadis I would like to follow up regarding the issue. Were you able to update the LDM? Did it help to resolve the issue?

Byron Antoniadis

02/13/2025, 1:52 PM

I was at a conference so lost touch with the problem for a bit. Will update by EoD

Branislav Slávik

02/13/2025, 1:54 PM

No problem and/or rush, I just wanted to check back with you on the status.

Branislav Slávik

02/18/2025, 12:15 PM

Hello @Byron Antoniadis, Just a quick follow up on the issue. Were you able to get back to it and update the LDM? Has the issue been resolved? 🤞🤓

Byron Antoniadis

02/19/2025, 6:12 PM

@Branislav Slávik thank you so much for your assistance! This was the problem, the issue has been resolved and our client is happy! 🙂

Byron Antoniadis

02/25/2025, 6:26 PM

Hello @Branislav Slávik - a client that uses workspace “Meadway” noted that the Retention dashboard still doesn’t work for them. This is very weird, the individual visualizations load up under “Analyze” but when I look at the dashboard they are infinitely spinning. Please help 🙏

Byron Antoniadis

02/26/2025, 6:42 AM

I failed to mention that the dashboard works for all the other workspaces

Branislav Slávik

02/26/2025, 9:45 AM

Hello @Byron Antoniadis, I suppose that this is not related to the issue we have discussed before. Could you please create a new thread for it? Either I or one of my colleagues will look into it as soon as possible and assist you further.

Byron Antoniadis

02/26/2025, 10:53 AM

@Branislav Slávik the client said it was never fixed since then so it’s the same issue 🙏

Byron Antoniadis

02/26/2025, 2:01 PM

can you provide an update before EoD? Thx 🙏

Branislav Slávik

02/26/2025, 3:48 PM

@Byron Antoniadis Thank you for sharing the information. I have done some initial investigation and so far, the issue seems to be slightly different. At least from the error:

Copy code

{
  "title": "Bad Request",
  "status": 400,
  "detail": "An error has occurred while calculating the result",
  "resultId": "e4d97221692181f4fc102e34dc770432e32fd76f",
  "reason": "Query timeout occurred",
  "traceId": "55c85b0412bf68bc65e4b28cfaa5badb"
}

That states that is has been caused by the query timeout. Is this customer somehow "different" from the others? E.g. in the amount of records or data? In addition, I reached out to the developers in order to discuss further the other possible reasons for this strange behaviour in only one of the child workspaces.

👀 1

Byron Antoniadis

02/26/2025, 5:05 PM

Thanks @Branislav Slávik - the weird thing is that for other clients with far more data (check workspace “Laskaridis”), it succeeds. That’s why I think it’s weird

Byron Antoniadis

02/27/2025, 5:27 PM

Team, do we have an update here? Our client is undergoing an audit on Monday and wants to show this

Branislav Slávik

02/28/2025, 5:01 PM

Hi Byron, I am in touch with our developers regarding the issue. They confirmed that the issue is different than the initial one before and that the timeout indeed happens on the Postgres side. The SQL queries take longer than 160s to finish, For now, we can only guess what causes the queries to take significant time to execute just for this particular workspace / customer. Would it be possible for you to find the problematic report queries and execute them manually in Postgres with

EXPLAIN ANALYZE

? The idea would be to compare the execution plans of the statement(s) / queries to see if there are any differences, e.g. between workspaces of different customers.

Byron Antoniadis

03/06/2025, 8:43 AM

Hello - I have run the

EXPLAIN ANALYZE

for the

Production parent workspace

(contains ALL data for ALL clients and works!!!) and the specific failing workspace

Meadway

and the SQL statement for Meadway runs much faster & the only diff in the analyze is applying the

shipco_id

filter. I do not know where to go from here, so I’ll need your help. It’s extremely weird that the superset workspace works and the subset doesn’t

Byron Antoniadis

03/07/2025, 7:01 AM

Hello - ping here 🙏

Byron Antoniadis

03/07/2025, 1:08 PM

Hello - another ping, I’d like to provide an update to our client before EoW

Branislav Slávik

03/07/2025, 1:12 PM

Hi Byron, apologies for missing the updates here. Let me have a look into it a bit and I will get back to you before EoD.

🙌 1

Branislav Slávik

03/07/2025, 5:23 PM

@Byron Antoniadis Could you please share the queries (and their results and timing) executed with the

EXPLAIN ANALYZE

for the

Meadway

and

Parent workspace

that correspond to executions related to the Retention dashboard with us? Once shared, our developers would like to investigate them further and will compare them with the ones found in our logs to confirm and hopefully find the cause of the difference(s). Please note, that not all queries related to the dashboard report execution(s) might necessarily execute for a long time. There might only be a few of them.

Byron Antoniadis

03/09/2025, 8:12 AM

Here you are, please go through the document and let me know if you have any qs/points for discussion

Retention rate timeouts.docx

Byron Antoniadis

03/10/2025, 8:04 AM

Goodmorning - wishing a great week ahead. Please lmk when we have news

Byron Antoniadis

03/10/2025, 10:13 AM

FYI - I managed to change some things and fix this. It seemed that the year filter was the problem again, but only for this dataset. The worrying thing is that I couldn’t figure this out from error reports.

Branislav Slávik

03/10/2025, 2:57 PM

Hi @Byron Antoniadis, thank you for sharing the query details. I guess it won't be needed anymore, but it is good to have it here for future reference. I am glad that we were able to fix the issue by updating the

year

filter for this dataset. Was the fix / update similar or the same as the one recommended earlier? https://gooddataconnect.slack.com/archives/C01UR5BGAHY/p1739199906643769?thread_ts=1738832429.525059&cid=C01UR5BGAHY Thank you very much for sharing your feedback with us. I will pass it to our product team for their consideration.

Byron Antoniadis

03/11/2025, 6:41 AM

@Branislav Slávik I had changed all years to be INT, however in one model it was still an NUMERIC and I had to explicitly change it to INT. However, what boggles my mind is how was this affecting only one workspace 🤯

Branislav Slávik

03/11/2025, 4:52 PM

@Byron Antoniadis, unfortunately, I do not have any exact explanation for that. However, I found some forums / pages over the internet that might explain it. 🤓 It seems that the NUMERIC type is way larger (12 bytes) compared to INT (4 bytes). Also, the performance of some operations is slower: https://dba.stackexchange.com/questions/110880/numeric-vs-integer-for-a-column-size-and-performance https://stackoverflow.com/questions/58155982/what-is-the-difference-between-numeric9-0-and-int-in-postgres In this article, they even tried to measure it and it appears to be ~ 50% to 70% slower performance: https://medium.com/xendit-engineering/benchmarking-pg-numeric-integer-9c593d7af67e#:~:text=Synopsis,more%20time%20than%20integer%20types.

Branislav Slávik

03/17/2025, 9:59 AM

@Byron Antoniadis I would like to follow up, just to make sure. May I consider the issue as resolved or is there anything else regarding the matter I could help you with?

12 Views

Open in Slack

Previous Next