Hi I have set up Cloudwatch for my GDCN cluster and sent all GoodData #gooddata-cn

Hi. I have set up Cloudwatch for my GDCN cluster a...

Dongfeng Lu

08/20/2024, 10:30 PM

Hi. I have set up Cloudwatch for my GDCN cluster and sent all logs to Cloudwatch. With so many micro services, a single web request may pass through many of the services. What is the best practices using Cloudwatch (anomalies, insights) to work with GDCN to verify or debug the applications? For instance, When I tried to load the LDM for one of the workspaces with "/modeler/#/f97c8a4......", it seems to be slow, taking about 40 seconds to load, and I'd like to know where the bottleneck is. I can use the time interval (since I know when it happens, and only me on the cluster at this point) to get back 672 records, and when I filtered it by the workspace number "f97c8a4", I got only 22 records, from "metadata-api" and "ingress-nginx-controller". From these logs, I don't seem to get much useful information. Each log contains tons of information about the pod configuration, maybe useful for other things, but not for my purpose. If I concentrated on "msg" from "metadata-api", I see 2 "Workspace meta configuration created", 2 "Retrieve logical model.", and 9 "HTTP response". The logs for "ingress-nginx-controller" shows a lot of HTTP calls generated from the original call, but I am not sure about the format and which one shows the response time. Of course, I am also concerned about that more than 600 logs we filtered out with the workspace ID, and their function for this request. So how should we use the logs? How do we tie all logs related to one request? I used workspace ID here for testing, but in production, there could be many simultaneous requests related to the same workspace ID, so how do we tell them apart? More importantly, can we get more useful messages? Do we need to go to "DEBUG" level of logging? Any help and insight is appreciated.

Dongfeng Lu

09/09/2024, 3:30 PM

Hi Jan, We are using external RDS and ElastiCache following the steps in https://www.gooddata.com/docs/cloud-native/3.7/deploy-and-install/cloud-native/environment/aws. For Metadata API, we specified metadataApi: encryptor: enabled: false resources: limits: cpu: 1250m memory: 1300Mi requests: cpu: 1250m memory: 1300Mi I am sending you two LDM files, one for "GDCN-LDM-Dev Master converted from platform.json", which is described above with a long loading time. We don't have a data source connected to this model yet, and it was created based on a conversion from our Platform LDM. The other is for reference using Gooddata demo data, for which the data was imported to a Redshift, and the LDM was created by connecting to the data source. In another word, it should be purely created by GDCN. For both LDMs, I observed both "logicalModel?includeParents=true" and "logicalModel?includeParents=false".

GDCN-LDM-Dev Master converted from platform.json GDCN-LDM-Demo.json

Jan Kos

09/25/2024, 12:37 PM

Hi, I’m sorry for the delay getting back to you. Retrieving LDM can be an expensive operation and take some time when there is a bigger LDM. However I reviewed internally and there is no need that UI client would call “logicalModel?includeParents=true” and “logicalModel?includeParents=false” every time when rendering LDM. I created an internal ticket for optimization however there is no specific timeline for the fix yet.

Dongfeng Lu

09/25/2024, 6:58 PM

Good to hear. Thank you.

2 Views

Open in Slack

Previous Next