Solved

How does GoodData cache reports?

  • 7 August 2020
  • 3 replies
  • 141 views

Hi, can you please help me to understand if and how GoodData caches reports and dashboards? Is there a way how I can fine tune the caching parameters like frequency, cache invalidation and so on?

icon

Best answer by Jakub 7 August 2020, 11:46

Hi Frank,

 

The GoodData Platform employs multiple types of caching and it is all part of our analytical product.

You can find “multi-level caching” mentioned in our documentation. It ensures that response times are minimized as much as possible, and the logical data model provides an abstraction layer that ensures database independence.

Here is an extract from XAE engine documentation:

“The Extensible Analytics Engine performs a number of query optimizations for processing and execution. Based on a series of mathematical operations, individual queries are partitioned horizontally by project and broken down into smaller, more efficient queries, the results of which are cached for future use. XAE inspects each calculation for sub-queries existing in caches across the entire multi-tenant platform, optimizing for performance and freshness of results.”

 

Now simply said thanks to the above the projects can run faster. A report called from cache is much faster than an un-cached report. You can test that when running a new report first time takes longer in comparison to the next attempt.

 

The cache is limited by certain threshold and when not used it is invalidated. It is also affected by data loads (for example a /etl/pull2 resource or a GD_dataset_writer in CloudConnect).
After an ETL process completes, project cache gets again invalidated, and the reports needs to be re-computed to "warm up" multi-level project cache.

 

Just to cover also a bit more on caching topic I can mention also the GoodData.UI. Our Library for building analytical applications contains the Visualization component. It uses cache for getting some static information from the GoodData platform to minimize the number of redundant requests. User can clear the cache from time to time (for example, after logging out, or when leaving a page with visualizations using the GoodData.UI components) by calling clearSdkCache from the sdkCache module.

 

I hope my answer provided some insight on the topic.

Cheers,

Jakub

 

View original

3 replies

Hi Frank,

 

The GoodData Platform employs multiple types of caching and it is all part of our analytical product.

You can find “multi-level caching” mentioned in our documentation. It ensures that response times are minimized as much as possible, and the logical data model provides an abstraction layer that ensures database independence.

Here is an extract from XAE engine documentation:

“The Extensible Analytics Engine performs a number of query optimizations for processing and execution. Based on a series of mathematical operations, individual queries are partitioned horizontally by project and broken down into smaller, more efficient queries, the results of which are cached for future use. XAE inspects each calculation for sub-queries existing in caches across the entire multi-tenant platform, optimizing for performance and freshness of results.”

 

Now simply said thanks to the above the projects can run faster. A report called from cache is much faster than an un-cached report. You can test that when running a new report first time takes longer in comparison to the next attempt.

 

The cache is limited by certain threshold and when not used it is invalidated. It is also affected by data loads (for example a /etl/pull2 resource or a GD_dataset_writer in CloudConnect).
After an ETL process completes, project cache gets again invalidated, and the reports needs to be re-computed to "warm up" multi-level project cache.

 

Just to cover also a bit more on caching topic I can mention also the GoodData.UI. Our Library for building analytical applications contains the Visualization component. It uses cache for getting some static information from the GoodData platform to minimize the number of redundant requests. User can clear the cache from time to time (for example, after logging out, or when leaving a page with visualizations using the GoodData.UI components) by calling clearSdkCache from the sdkCache module.

 

I hope my answer provided some insight on the topic.

Cheers,

Jakub

 

Userlevel 1

Hi Frank,

let me highlight one important thing:

That “multi-level caching” mentioned by Jakub means that GoodData caches reports as well as results of intermediate computations. Our query engine creates granular caches that can be reused across different visualizations.

For example, consider the following scenario:

  • A table showing Revenue by Account table
  • A pie chart showing the numbers of accounts with Revenue over and under $1m.
  • A line chart showing the number of accounts with Revenue over $1m broken by region

In this situation, all three insights can reuse the same cache holding the sum of Revenue grouped by Account.

In addition to this multi-level caching, GoodData dashboards also take advantage of standard HTTP caching because each computation result is represented as a resource with a unique and consistent URL.

Pavel

Hey Community Team,

Would this same logic of caching apply when calling Reports, Dashboards via an API call?

I want to know if calling an API for a respective Report would trigger caching.

The use case might be to do periodic polling to certain sets of Reports or Dashboards prior to a customer viewing the workspace so that the load times would be faster.

Would this work?

Reply