Hi. I have set up Cloudwatch for my GDCN cluster and sent all logs to Cloudwatch. With so many micro services, a single web request may pass through many of the services. What is the best practices using Cloudwatch (anomalies, insights) to work with GDCN to verify or debug the applications?
For instance, When I tried to load the LDM for one of the workspaces with "/modeler/#/f97c8a4......", it seems to be slow, taking about 40 seconds to load, and I'd like to know where the bottleneck is. I can use the time interval (since I know when it happens, and only me on the cluster at this point) to get back 672 records, and when I filtered it by the workspace number "f97c8a4", I got only 22 records, from "metadata-api" and "ingress-nginx-controller".
From these logs, I don't seem to get much useful information. Each log contains tons of information about the pod configuration, maybe useful for other things, but not for my purpose. If I concentrated on "msg" from "metadata-api", I see 2 "Workspace meta configuration created", 2 "Retrieve logical model.", and 9 "HTTP response". The logs for "ingress-nginx-controller" shows a lot of HTTP calls generated from the original call, but I am not sure about the format and which one shows the response time.
Of course, I am also concerned about that more than 600 logs we filtered out with the workspace ID, and their function for this request.
So how should we use the logs? How do we tie all logs related to one request? I used workspace ID here for testing, but in production, there could be many simultaneous requests related to the same workspace ID, so how do we tell them apart? More importantly, can we get more useful messages? Do we need to go to "DEBUG" level of logging?
Any help and insight is appreciated.