Solved

CSV data upload and big datasets

  • 4 April 2023
  • 2 replies
  • 180 views

  • Known Participant
  • 18 replies

Hi,

My workspaces are populated by ADD bricks that load CSV files stored on S3. As there’s a limit on headers amount in CSV files, which is 250, and not only dataset fields count but also relations to date and other datasets, or additional labels, how can I handle the situation where I’d need more fields? For example if I need a dataset “projects” that already has 200 fact fields, 40 relations to custom date fields, and I’d need to add additional 20 custom fact fields, is there some way to either use more than 1 CSV file to upload data for this dataset or split it to have 2 fact datasets, but use them as if they were one in the reports?

 

Best,

Hanna

icon

Best answer by Francisco Antunes 5 April 2023, 14:36

View original

2 replies

Hi Hanna,

Thank you for reaching out with this! Let’s talk about the CSV header limit.

Although GoodData’s Platform Limits do specify a 250 column limit for CSVs, this is actually not a hard limit. It is the official supported limit but the platform is able to load larger column numbers. Nevertheless, it is not a recommended action, as we do not guarantee that it will work consistently and will not be able to provide support if it fails (other than recommending to follow the official limit). In other words, you may surpass the 250 header limit, but it is not a supported action.

Please note that increasing the width of a loaded dataset also increases the loaded data volume and processing times. This may lead to the breach of some of the other platform limits described in the article I shared above (and some of those are indeed hard limits), as well as negatively impact performance. The best practice is to attempt to optimize how your data is structured and loaded, to avoid such issues. I would recommend taking a look at our GoodData University course on Understanding the Logical Data Model for more information on the matter.

For example, it might be worth it to distribute the data from this dataset into 2 or more datasets, configuring the LDM so that they still work in a similar way for reporting, thus avoiding such wide CSV loads. Also, it is not possible to load a dataset with 2 or more CSVs, each containing a bit of the data, to circumvent the limits.

I hope this is helpful! Let us know if you need anything else.

 

Best regards,

Francisco

Hi Francisco,

Thank you for your reply.

The 250 header limit is hard enough to raise an error on running ADD brick with the following message: Error parsing row=1. The number of columns is too high. Maximum values are '250' for the column count. :) 

So I guess, if I wanted to extract Project custom fact fields to a separate dataset in a way, that they can be filtered by all attributes that work also for facts from original Project dataset, model like below will work.

In case other direction was needed, I see that explicit lifting is possible: https://help.gooddata.com/classic/en/dashboards-and-insights/maql-analytical-query-language/maql-use-cases-and-tutorials/explicit-lifting/. However, it’s documented under Classic. Will ot work the same for GoodData Platform?

Best,

Hanna

Reply