Would you say Gooddata can replace a Qlikview web app We hav GoodData #gooddata-platform

Would you say Gooddata can replace a Qlikview web ...

Michael Serres

05/25/2022, 5:30 PM

Would you say Gooddata can replace a Qlikview web app? We have about 60m records on the main table. Will it work or be horribly slow? Can we rebuilt this seemless data exploration, point and click, in memory data?

Jan Soubusta

05/25/2022, 6:33 PM

What kind of database do you have?

Roman Stanek

05/25/2022, 7:53 PM

@Jan Soubusta - Qlik uses an old and proprietary in-memory database and I assume that that’s what Michael is using. @Michael Serres Am I correct?

Roman Stanek

05/25/2022, 7:54 PM

We have an embedded Postgres database and we support a whole range of scalable databases - Vertica, Snowflake and others. There are also new datawarehouses focused specifically on speed - Firebolt is one example - but we haven’t integrated with them yet

Michael Serres

05/25/2022, 8:21 PM

@Jan Soubusta @Roman Stanek we create a new database, postgres, mongodb. Currently it's on SAS!? Maybe my first question is... can Gooddata handle 60m rows? or works best of data is aggregated first?

Jan Soubusta

05/25/2022, 8:23 PM

Generally, as @Roman Stanek mentioned, we are integrated with databases like Snowflake/Vertica, which can provide sub-second latency even with much bigger data volume than 60m of records. Additionally, we provide three levels of caching: • Final report results • Raw report results - before pivoting, sorting, paging • Pre-aggregations - optional, stored in your data source, intermediate aggregations identified in your report definitions, can be reused by multiple reports Caches are invalidated by you after next ETL finishes (new data delivered). When querying caches, we can support any data volume with very low latency and very high concurrency. When querying data source (after ETL finishes and caches are invalidated), latency/concurrency depends on what data source technology you use.

Jan Soubusta

05/25/2022, 8:30 PM

Postgres can work quite well even with millions of rows, if you index key columns and assign sufficient HW resources. But with more complex reports (requiring larger joins and aggregations) I would recommend to utilize a database, which is suitable for analytics queries (MPP, columnar, horizontally scalable). Do you insist on using for-free database engines like Postgres/Mongo or would you consider using any paid alternative meeting the above criteria?

Michael Serres

05/25/2022, 8:31 PM

@Jan Soubusta yes that's the kind of detail I like, thanks! These statements are true also with mongodb, mysql and postgres? or you really need snowflake to reap the benefits?

👍 1

Michael Serres

05/25/2022, 8:33 PM

Open to paid solutions, migration possible to cloud db solution, not set in stone

Michael Serres

05/25/2022, 8:48 PM

@Jan Soubusta Snowflake would be best / recommended? With the free tier can I use postgres with 60m records to put GD to the test. Basically, I have to build a POC, ensure the UI is sleek before I can approve the tech stack for the project.

Jan Soubusta

05/25/2022, 8:48 PM

Luckily, there are many options 😉 I recommend neither MySQL(MariaDB) nor Mongo. Postgres works will with mid-size datasets. Depending on how much you optimize the physical data model, it can work with your data. There are many MPP columnar options: • Snowflake, BigQuery, Redshift. Based on my exps, I would recommend Snowflake. Performs best and you pay only for what you use. With our caches and not too frequent ETL it could be even cheapest solution. • Vertica - works even on premise. Performs even better than Snowflake, but requires quite expert skills. We use it in our SaaS and are satisfied. • Postgres with columnar and cluster extensions. There are multiple providers. Better for analytics, but may be challenging to operate. Not sure if there are good SaaS offerings. • EXASOL - both SaaS and on-premise. Incredible benchmark results (tpc.org), but I have no personal experience yet. • New federation technologies (Dremio, Starburst, Trino, ...).

Jan Soubusta

05/25/2022, 8:50 PM

Personally I would start PoC with Postgres. I would index key columns in large datasets (JOIN and GROUP BY columns). Once anything is not responsive, let us know.

Michael Serres

05/25/2022, 8:52 PM

Awesome chat much appreciated. Stay well @Jan Soubusta

Jan Soubusta

05/25/2022, 8:58 PM

You are welcome 😉 One more piece of advice - set up partitioning.

Open in Slack

Previous Next