Solved

Performance tuning


Hi Gooddata,

We would like to tune our GoodData application in our Kubernetes cluster.

Currently we just use the default values everywhere but it seems a bit slow (all page loadings are slow).

I tried to find topic or details about the performance fine-tune but haven’t found detailed documentation about the options.

My question would be that what types of tune options we have and where can we change?

  • java/xmx values
  • helm chart options (found some doc)
  • recommended node count/aws ec2 instance types
  • recommended setup for prod for quick page loading (in terms of gooddata - would not touch the db/dashboards topic, it’s a different stuff) - currently the main page which contains the workspace list loads (4 workspace) around 5-6-7 sec, it should be under 1 sec

Thank you!

Zoltan

icon

Best answer by Robert Moucha 20 May 2022, 15:46

View original

10 replies

Userlevel 2

In order to maintain peak performance you should follow the guid as outlined here: 

https://www.gooddata.com/developers/cloud-native/doc/1.6/administration/data-sources/performance/

 

Hi Joseph,

Thanks, I’ve found this documentation but it’s all about the downstream dependencies (db, schema, etc) not about the GoodData application itself. As I mentioned the main page of the application (workspace list) is slow as well and there should be no db communication there (only for gooddata’s postgres maybe) and it’s independent from the downstream representation layer.

Thanks

Userlevel 3

Hello Zoltan,

Did you review the recommended specs on https://www.gooddata.com/developers/cloud-native/doc/1.7/installation/k8s/requirements/ ?

  • Version 1.19 or higher
  • 3 worker nodes, each configured with at least 2 vCPU and 4 GB RAM

You said you were running the default setup, but I would just like to make sure what sort of configuration you have on your end.

Hi Jan,

Yep we started with that but tried to improve with adding t3.xlarge with 4 node but it didn’t helped, that’s why I think we need to have more options somewhere maybe on application side...or do you have other idea where we should find the problem?

Thank you!

Userlevel 3

Thanks. If I am understanding correctly, the delay you’re seeing relates to the loading of various elements in the UI. For example the workspaces, like you mentioned.

When you access the initial screen, an API call is made in the background to /api/entities/workspaces?include=workspaces&sort=name&size=250&page=0 in order to retrieve the workspaces and render them in the UI.

Could you have a look in the network tab of your browser and find out how long it takes to process this request? While you’re there, you should be able to see in general which part takes the longest and where the bottleneck could be.

Example:

 

Yep, took a screenshot. However it was “fast” now but the page load is still over 4s. I haven’t seen ~100ms load times for the workspace endpoint as on your screen, our is still ~3x slower.

 

Userlevel 3

My screenshot came from loading resources from a locally hosted CN instance, so I wouldn’t necessarily compare its performance to a request that actually has to do some travelling 🙂 It was only for illustration of where to look.

Could we see the network tab as sorted by time descending?

I see :)

Yep, attached...maybe we should check our cloud networking as well (but there is nothing special) but the initial datasource API call was slow too. It was more than 6 sec again.

 

Userlevel 2

Hi Zoltán,

there are multiple hints how to increase performance:

  • grant more CPU/Memory resources to services running in k8s. There are *.resources.limits set per component (asterisk is a placeholder for many various components, see values.yaml in our helm chart). The default limits are set rather low to make sure it will also run on smaller HW). If you have issues with slow API requests (those xhr-type requests in browser), focus on the following components: afmExecApi, metadataApi and resultCache.
  • Network latency - if your cluster runs in remote region, network latencies can be bigger. You may want to deploy your cluster closer to your clients.
  • Monitoring - watch closely performance metrics and identify possible bottlenecks.
  • Authentication - we are aware of some inefficiencies in authentication processing. These issues were fixed after the version 1.7.2 was released and will be included in the next version. It should lower the TTFB metric.

Regards,

Robert Moucha

Hi Robert,

Cool, thank you!

I will check the network latency and compare the values which are in the log files (duration) as well.

Reply