I’m just getting started with Gooddata and have successfully loaded CSV files from an S3 bucket with ADD, after a bit of fiddling to get my filenames matching my model name etc. (This article was very helpful, thanks!). I can see the data populating in my model and view it in a dashboard - so far, so good...
However the data I really want to load is being written by Apache Spark (on AWS EMR) as a CSV file that is written in “append” mode, so the structure of the file in S3 looks like this:
There are potentially hundreds of the part-00000-xxx files written by Spark. I control the name of the “folder” hourly-data-dump.csv but not the files within in.
This must be a pretty common use case - any tips on how to get this data to load into Gooddata using ADD as a regular update?