Solved

GD.CN running the same, long running query in multiple parallel sessions

2 years ago
23 November 2021
5 replies
138 views

tibi
New Participant
0 replies

We have noticed that GD.CN initiates 20+ parallel sessions to Vertica, trying to return the entire dataset, while we are not actively running anything. As these sessions trying to return an entire dataset, they are running for hours.

Does anybody nkow why is this happening and how to prevent it?

icon

Best answer by jacek 24 November 2021, 09:06

View original

5 replies

Userlevel 2

jacek
GoodData Employee
26 replies
2 years ago
23 November 2021

Hi Tibi,

can you please copy&paste whole statement(text, not screenshot) for each “type” of long running query?
By “type” I mean to group queries by what aggregations are used.

Carson
New Participant
5 replies
2 years ago
23 November 2021

Tibi’s colleague here -- an example of long-running query:

( SELECT "visitation_dn"."day_of_week" AS "a_label_visitation_dn_day_of_week_ba541a70e9034_ac6cb5b887df02", "visitation_dn"."destination" AS "a_label_visitation_dn_destination_ba541a70e9034_53782e75b1687b", "visitation_dn"."distance_from_home" AS "m_1" FROM "gd"."visitation_dn" AS "visitation_dn" )

Carson

Userlevel 2

jacek
GoodData Employee
26 replies
2 years ago
24 November 2021
Answer

Hi Tibi and Carson,

the query looks weird, because it does not contain any aggregation.

It looks like you you LDM is not correct - no aggregation in this case means that one of used labels is primary key, so our analytics engine does not generate aggregation. Because we support only single column (attribute/label) primary keys now, I assume that “destination” is the primary key in this case. Is it really correct? I suppose that if you fix the LDM and appropriate aggregation query is generated by our engine, it will be running much faster and this issue won’t manifest.

However, generating more and more identical queries running too long is not a correct behavior. I have just created an internal Jira for this issue. Most likely we introduce a configuration for how long report queries can be running and we implement a mechanism for canceling queries running longer than this threshold.

Carson
New Participant
5 replies
2 years ago
30 November 2021

Thank you, Jacek, I believe that solves it.

For our use case, our table isn’t designed to have any unique PK.

And so, would be preferable for us to remove this PK requirement from the GoodData LDMing.

For interim, we will add an auto-increment id column as workaround.

David Kubecka
GoodData Employee
6 replies
2 years ago
1 December 2021

@Carson We would like to support multi column PK for center-of-star fact tables in the future. Dimensions (which are referenced from other datasets) would still require single column PK as that’s considered best practice for performance reasons.

Let me know whether this (multi column PK in fact tables) would better match your model.

Reply

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded