Hi All, I would like to ask a quick review the scr...
# gooddata-platform
z
Hi All, I would like to ask a quick review the screenshot attached to this thread. The source table only contains ~700k records so it’s not a huge table…for a table chart only added 5 metrics + one row field. Date filtering set from 2022-01-08 00:00 to 2023-12-31 23:59. The result was really slow (when it’s not cached), ~20s on db side so we’ve checked the SQL command. The question is that is this intentional that the query look like this? Or should we change anything?
image.png
request_ts is a timestamp field…what will happen if we will have millions of records for this query? We can scale the db engine later as well but I would start with query optimizing first. Thanks
z
@Jakub Sterba ^
j
Hi, could you open insight in Analyze and replace “edit” at the end of URL by “debug” and hit enter. It should download a zip file with more information about the query. We may have a look at the file if you send it to us and see why the query looks so complex and if there is any opportunity to optimize it.
z
sure, let me check
sent in pm, thanks
j
We know about this issue. Generally, it is impossible to move DATE functions to only one side (or remove them completely) in all cases - sometimes it may not produce correct results. But, we should optimize in cases when it is feasible. We discussed it with @Jakub Sterba and we try to prioritize it on our roadmap. There is the only workaround now - functional indexes: https://www.postgresql.org/docs/current/indexes-expressional.html
z
Thanks, I see…unfortunately we’re using Redshift, but we’re in a middle of a process for scaling up/out so it may help a bit…will see, thank you!
j
We documented (high-level) performance tips for Redshift here: https://www.gooddata.com/developers/cloud-native/doc/2.0/connect-data/supported/redshift/#performance-tips Unfortunately Redshift does not support partitioning. What feature (sortkey, distkey) is relevant in this case? What exactly do you expect from Redshift (way of optimization) if the SQL query would be designed better (without DATE functions)?
z
Good question, but as I’ve seen without date functions it can be cached by the engine so after it cached it was 20-30ms, so I think the first goal should be that the generated gooddata queries (result) should be cached by the engine
j
Interesting. If I understand it well, you say that Redshift can cache queries (result?) and then it is much faster, and that it does not work if SQL queries contain DATE functions - right? To be absolutely honest, from my experience with Postgres, Redshift, Snowflake, BigQuery and Vertica (supported in GD.CN) and also with other databases in general, Redshift does not perform well and also sometimes it is not deterministic. So be careful and optimize decently 😉 One more hint - it is possible to enable a third-level caching: https://www.gooddata.com/developers/cloud-native/doc/2.0/connect-data/data-source-entity/#enable-caching-and-set-cache-path These caches are stored in data sources in a separate schema. They are sort of pre-aggregations. Sometimes it may help significantly with performance, esp. when the complexity of reports is growing and large SQL queries with many JOINs are generated.
z
Sorry for late response!
If I understand it well, you say that Redshift can cache queries (result?) and then it is much faster, and that it does not work if SQL queries contain DATE functions - right?
Yes, exactly, but hard to decide whether it’s because it contains DATE functions or because the query will be too complex and have a lot of DATE functions 🙂
Redshift does not perform well and also sometimes it is not deterministic.
thanks for the heads-up, will keep in mind 🙂
it is possible to enable a third-level caching:
Thanks for the tip, will check it!