Solved

GD.CN running the same, long running query in multiple parallel sessions

  • 23 November 2021
  • 5 replies
  • 138 views

  • New Participant
  • 0 replies

We have noticed that GD.CN initiates 20+ parallel sessions to Vertica, trying to return the entire dataset, while we are not actively running anything. As these sessions trying to return an entire dataset, they are running for hours.

Does anybody nkow why is this happening and how to prevent it?

icon

Best answer by jacek 24 November 2021, 09:06

View original

5 replies

Userlevel 2

Hi Tibi,

can you please copy&paste whole statement(text, not screenshot) for each “type” of long running query?
By “type” I mean to group queries by what aggregations are used.

Tibi’s colleague here -- an example of long-running query:

 

( SELECT "visitation_dn"."day_of_week" AS "a_label_visitation_dn_day_of_week_ba541a70e9034_ac6cb5b887df02", "visitation_dn"."destination" AS "a_label_visitation_dn_destination_ba541a70e9034_53782e75b1687b", "visitation_dn"."distance_from_home" AS "m_1" FROM "gd"."visitation_dn" AS "visitation_dn" )

 

Carson

Userlevel 2

Hi Tibi and Carson,

the query looks weird, because it does not contain any aggregation.

It looks like you you LDM is not correct - no aggregation in this case means that one of used labels is primary key, so our analytics engine does not generate aggregation. Because we support only single column (attribute/label) primary keys now, I assume that “destination” is the primary key in this case. Is it really correct? I suppose that if you fix the LDM and appropriate aggregation query is generated by our engine, it will be running much faster and this issue won’t manifest.

However, generating more and more identical queries running too long is not a correct behavior. I have just created an internal Jira for this issue. Most likely we introduce a configuration for how long report queries can be running and we implement a mechanism for canceling queries running longer than this threshold.

Thank you, Jacek, I believe that solves it.

 

For our use case, our table isn’t designed to have any unique PK.

And so, would be preferable for us to remove this PK requirement from the GoodData LDMing.

For interim, we will add an auto-increment id column as workaround.

@Carson We would like to support multi column PK for center-of-star fact tables in the future. Dimensions (which are referenced from other datasets) would still require single column PK as that’s considered best practice for performance reasons.

Let me know whether this (multi column PK in fact tables) would better match your model.

Reply