Hi, We are thinking on having some User Data Filte...
# gooddata-cloud
j
Hi, We are thinking on having some User Data Filters for our managers, but started to consider performance impact when using filter expression
IN
with a huge collection of User IDs, the performance would usually degrade in this scenario but we want to better understand it, in case there are any internal mechanism to improve this already in place, as the data filter only allows a MAQL string. Also, can we consider that this IN clause is directly passed to the SQL statement?
j
Hi Jhonatan, I can definitely confirm that the UDF condition is already involved in the SQL querying your data source. You can easily verify that by navigating to “Analyze” tab, create some report and in the URL, change “edit” for “debug”. It will download a file containing the SQL as well and you can compare non-UDF vs UDF report.
Regarding the performance, there is no easy answer, it basically depends. UDFs are also distributed throughout the LDM connections so in some complex report/LDM scenario the performance demand might be really higher. But it depends on LDM, type of report, data volume, database indexing, projections etc. I’m not the export in this field but if you have some specific questions regarding this, let me know and I can consult with our engineering.
j
Hi Julius, do you know if a MAQL can directly access GD resources like user_group I need to do something like {label/user_group.id} IN ("group1", "group2") so that one user would only be able to see data for all the users in a certain user_group
j
Not in the MAQL. UDFs doesn’t work like that. The user or user_group is specified when you are setting the UDF in the workspace. Example payload:
Copy code
{
  "data": {
    "type": "userDataFilter",
    "id": "<data-filter-id>",
    "attributes": {
      "maql": "<maql-expression>",
      "title": "<data-filter-name>"
    },
    "relationships": {
      "user": {
        "data": {
          "id": "<user-id>",
          "type": "user"
        }
      }
    }
  }
}
You can either use “type”: “user” or “type”: “userGroup”
So what you could for example do here - you could make sure that the user is part of both user groups and assign specific UDFs to these user groups.
j
problem is even if I assign an UDF to a user_group, I need to filter the data by only the users in that group manager is in GROUP 1, he should only see the data for users in GROUP 1, so somehow I need pass the user_ids to the UDF
what I am thinking at the moment is having a UserManager table, then the filter would look at that table to determine if a manager manages a user sort of like: {label/user.id} IN ({label/user_manager.user_id} WHERE {label/user_manager.manager_id} = CURRENT_MANAGER_ID)
let me know if you spot anything that could possibly not work here
j
maybe I misunderstood, so users are actual columns in your data?
j
We have a User dataset/table with all Users
my point was, even applying a filter to a user_group, the filter I need is exactly that one manager should only see data for users in that group
j
Apologies, I obviously didn’t exactly understand your question before. So to clarify further —
WHERE
clause isn’t supported in UDF MAQL in this context because UDFs only allow flat, row-level filters. You can use
AND
,
OR
, or
IN (...)
with static values like:
Copy code
"maql": "{label/user.id} IN (\"user1\", \"user2\", \"user3\")"
But expressions like:
Copy code
{label/user.id} IN ({label/user_manager.user_id} WHERE {label/user_manager.manager_id} = "mgr123")
won’t work, since UDFs can’t evaluate filtered sets or subqueries. To achieve that logic, you’d need to precompute the list of users for each manager and apply it statically in the UDF.
j
yeah we are aware of the first approach, the main concern is performance, have you guys not had any customer complaining about performance when having like 1k users in a filter like that?
@Julius Kos on the first approach when sending the request with 5k user ids the request fails, are you aware of a hard limit in this endpoint, also it seems it more then payload body size rather than an SQL issue it looks like the maximum is 1.2k user ids, that's the max I could send in the request
j
The limit indeed looks to be 1,2k it doesn’t allow me to create more. I am not aware of any customer who would reach out with this problem and I’m afraid that UDFs were to designed for similar purpose (of such high attribute element volumes).
I believe this should be rather handled via adjusting the LDM. I’m not aware of your data structure but I can rather imagine some scenario where you would show each manager’s users data by adjusting the LDM with adding a
UserManager
dataset with
user_id
and
manager_id
. Connect
user_id
to your User dataset, and
manager_id
to the GoodData user (or a Manager dataset). Then, assign a single UDF to each manager:
{label/user.id} IN ({label/user_manager.user_id})
Alternatively, you could also use workspace data filters in similar scenario but note that these needs to be applied on every dataset which needs to be filtered: https://www.gooddata.com/docs/cloud/workspaces/workspace-data-filters/
j
I understand what you mean, but it's not that we want to see the manager user data, but we want the manager to only see data from all users they manage which is why it would have to be something like: {label/user.id} IN ({label/user_manager.user_id} WHERE {label/user_manager.manager_id} = "mgr123") as
label/user_manager.user_id
would contain all the users associated with their managers
although it's not currently supported
what are the chances of Good Data increasing the maximum supported length of the UDF MAQL? or even chances of supporting SELECT/WHERE in the UDF MAQL?
@Julius Kos I have one idea that could work but it depends on how multiple filters work, let's say I have two UDF for the same users, each with similar to
userid IN (user_ids collection)
, do they work as an OR or AND? If they work as an OR when having multiple filters I could split the user IDs between many filters for the same manager.
I have checked this now and it's not an OR, so ruling out this last solution
j
Hey Jhonatan, sry I was not online yet. Yes indeed, combining multiple UDFs or even UDFs and WDFs is always an intersection so the condition is actually AND. This is nicely explained in our documentation:
Regarding increasing the limit or changing the UDF functionality itself - if you wish, I can only submit a product feedback here on your behalf so our product team can consider the changes in future developements
There might be some other way how to go about this, maybe adjusting the LDM but I’m not an export on implementations. For similar purposes we offer an assistance of our Services Team who are dedicated experts in this field. However, that is a paid service. In case you would like to go that way, you can contact your account owner Greta Jakubovskaja. You can also contact her in case you would like to escalate or ask about status of some specific product feedback.
👀 1
r
Hi @Julius Kos thank you - please do submit a product feedback on our behalf 🙏
✔️ 1
p
🎉 New note created.