Hi, what are the data volume constraints in goodda...
# gooddata-cloud
d
Hi, what are the data volume constraints in gooddata cloud? Let say, is it possible to load 1tb per workspace or limits are based on number of records etc... thank you
i
Hi Dmytro, You aren’t really loading the data into GD Cloud. All the analytical queries are running on data stored in your database(data source connected to your workspace). You can find more hints in this article.
j
There is a limit for how many rows can the report result contain. We cache these results. So you cannot overflow GD with report results. Larger the data volume in your data source is, longer the SQL queries executed by GD are running. It depends on the DB engine you use and amount of hardware you assign to it. For instance, you can execute SELECT SUM(x) GROUP BY y on top of 1TB table in Snowflake and it can be running under 1s if the cardinality of the column "y" is low. So to sum it up - you can connect any data volume to GoodData, but you should be careful how you expose it to GD: • Cardinality of reports • Number of users • DB technology - we recommend columnar engines • Can end users create custom reports?
d
Great, thank you
j
and last but not least - we recently released so called FlexCache, which is a new cache component based on Apache Arrow: https://www.gooddata.com/blog/how-to-build-analytics-with-apache-arrow/ One way how we plan to extend it in near future is to pre-fetch more data from underlying data sources to our cache. But still, it will scale, there will be no hard limit, just it will correlate with the pricing, obviously 😉