Hi GD support team hope everyone is doing well smile Im curr GoodData #gooddata-platform

Hi GD support team, hope everyone is doing well :s...

Alson Yap

06/27/2025, 7:47 AM

Hi GD support team, hope everyone is doing well 😄 Im currently trying to use a FlexConnect as a data source. However, I noticed that if it returns a size of ~200k rows (with say about 10-15 columns), it would give this error

Copy code

Reached limit of max size of data returned from data source. The limit is 20971520 bytes

This can occur even if I select 1 column to be placed under

Rows

on GD (unless I should go change my code to return only for that column). Then again, even if so, this can arise if the number of rows is large. Is there any way to resolve this? I didnt run into this issue when I was using CSV files as data source. Perhaps does GD support pagination or returning data in batches? (actually I've tried the latter, by changing from PyArrow Data and turned it into batches prior to returning, but still run into this issue). Any assistance will be much appreciated! 🙏

👀 1

Jakub Sterba

06/30/2025, 8:57 AM

Hi, The limit for size of dataset returned by FlexConnect in GD Cloud is indeed 20MB while for CSV dataset the current limit is 200MB (1TB for all the files in total). We may consider to increase the limit to 200MB as for CSV files. Would it be enough for your use case? One way to mitigate the problem is to analyze

columns

in an execution context of FlexConnect table function and return only columns which are needed for the SQL query which is executed on top of returned data. You can return only a single column even if the table function is declared with 100 columns, but visualization displays just one. This can help to reduce the size in case of wide dataset with many columns.

Alson Yap

06/30/2025, 9:15 AM

Hello Jakub, thank you for your reply. Yes is it possible to also increase the limit of the returned payload? I see, so there's an additional SQL layer being done on the returned data. I was suspecting if I could do that, because I noticed that I was only requesting for 1 column via GD, but I could return the entire table and it worked. Ok I will also add this optimization piece

Jakub Sterba

06/30/2025, 9:20 AM

The limit can be currently increased by configuration change only in GoodData CN and not on shared clusters of GD Cloud. Would 200MB satisfy your needs?

Jakub Sterba

06/30/2025, 9:27 AM

Btw, in some cases you may also consider pre-aggregation of data if some columns are not returned by query. E.g. if column X contains additive quantitative information and data looks like:

Copy code

A.  B.   X
a1. b1.  10
a2  b1.  20
a1. b2.  30
a2. b2.  40

and visualization requests only column A and X you may return

Copy code

A.    X
a1    10
a2.   20
a1.   30
a2.   40

and the SQL query on top will aggregate from the detail data, but you can also return

Copy code

A.    X
a1    40
a2.   60

the SQL function will still perform the aggregation but amount of transferred data for the aggregation will be lower

Alson Yap

06/30/2025, 9:46 AM

Yes, I think yes, let's try with 200 MB for now. Understood and thanks for the example on the above, I will try to add these optimizations in the code

Alson Yap

06/30/2025, 9:58 AM

Also on a second note, is it possible to increase the timeout duration? I realized that before my MongoDB finishes the querying to return the results, GD platform will show an error, but if given more time, then we'll be able to return the actual data

Jakub Sterba

06/30/2025, 11:27 AM

What is the use case? We have 180 seconds timeout which should be enough for interactive applications. The limit is there to prevent overloading of underlying databases by bad queries which do not provide data in time.

Alson Yap

06/30/2025, 11:39 AM

Hmm it's at 180s? I am currently setting the error show up within 25-30s. When I query off a smaller table then it wouldn't show the error. But yes I do understand the intention. I'll investigate further and provide evidence on this

Alson Yap

07/01/2025, 4:14 PM

Btw Jakub, will the 20MB be lifted to 200MB? If so, roughly when will this change be made?

Michael Ullock

07/01/2025, 4:54 PM

Hi Alson, at this time we have no ETA on when this change will be made. However, I will update our internal ticket with our Engineers now to see if I can get more details. Thanks for bearing with us in the meantime.

Alson Yap

07/02/2025, 2:19 AM

Thanks Michael, I hope that this change wouldnt be too big and wouldnt trouble you guys too much. But I think it's kinda important, I have some data that Im unable to show right now, even though I've just performed the optimizations suggested by Jakub Is there a way for me to keep track of this ticket or change?

Moises Morales

07/02/2025, 7:20 AM

Hi Alson, I hope it's OK for me to step in here. I am afraid that the ticket is internal and cannot be viewed on your side. However, as my colleague said, we will keep you posted. Thank you for your understanding.

Alson Yap

07/03/2025, 3:00 AM

Noted Moises, thanks for the assistance, looking forward to further updates 😄

Alson Yap

07/18/2025, 6:58 AM

Hi @Moises Morales do you happen to know if there's any update on this ticket pls?

Michael Ullock

07/18/2025, 7:31 AM

Hi Alson, thank you for checking in on this feature. I checked our internal ticket, and while it’s currently on our radar, it hasn’t been prioritised yet as our team is focused on other high-impact initiatives at the moment. If this functionality is important to you, we’d recommend reaching out to your Account Owner to discuss this and who can help push this update for you.

Alson Yap

07/18/2025, 7:33 AM

Thanks for your reply and the details, Michael. Sure understood, I will reach out to her, this is quite important for us as we may have columns that have high cardinality, resulting in a larger than 20MB sized data.

👍 1

Open in Slack

Previous Next