Hi team When I load CSV file in the CSV Downloader LCM pipel GoodData #gooddata-platform

Hi team, When I load CSV file in the CSV Downloade...

Nam Phan

06/26/2024, 11:12 AM

Hi team, When I load CSV file in the CSV Downloader (LCM pipeline), I made a mistake on type of column. And the GoodData system created columns that I cannot change their types with current credentials. But I could delete

src_csv_...merge

table. Is it safe if I delete these tables which have prefixes

src_, stg_, ls_

, then GoodData system will re-create it with correct type in the feed.txt file?

Francisco Antunes

06/26/2024, 1:04 PM

Hi Nam, a quick question for you: are you having the CSV Downloader brick automatically create the feed.txt file based on what’s on the CSVs, or did you manually create the file? Do you have access to the Feed file? If so, a simpler solution would be to edit it, changing the Data Type of the wrong columns. I’d recommend updating the version of the line representing the column, as well, to keep track of this change. As explained in the Feed File article, if you are updating the feed.txt file it is very important not to change the field and order fields after the first run, to avoid any issues with data inconsistency. With the updated feed file (and corresponding correction on your CSVs, if necessary), the CSV downloader should generate a new set of

src_

data, which will then be used by the other bricks in the Data Pipeline, thus fixing the error. If you just delete the tables without correcting the feed file, they would just be re-created in the same way when you run the brick again, as the brick can now find a file in the specified path. You could remove the feed file instead, and have the CSV downloader recreate it from scratch (based on the corrected CSV files, of course); In this case, you could delete the other files as well, but I think that wouldn’t be necessary (although it shouldn’t cause problems, if you make sure to run the pipeline from the bottom, to generate the files all over again).

Nam Phan

06/27/2024, 12:27 AM

Thanks for your explanation, @Francisco Antunes, I have access feed.txt file and change type without deleting

src_

table. From you explanation, it's safe to delete

src_, stg_, ls_

tables/views, right?

Nam Phan

06/27/2024, 1:55 AM

when I try to drop table, it throws error

[Vertica][VJDBC]Detail: Cannot drop Table src_csv_coaching_activity_fact_merge because other objects depend on it

Nam Phan

06/27/2024, 1:55 AM

and I don't know how to find others depend on this table. can you please help?

Francisco Antunes

06/27/2024, 11:54 AM

Hi Nam, I discussed it internally and in this case I’d recommend not deleting any files manually. Instead, fix the feed.txt file, and then run the entire Data Pipeline from the bottom. I would recommend performing a Full Load, to ensure a full batch of corrected data. The Data Pipeline bricks themselves will generate new

src, stg, ls

tables for you with the updated data types, ultimately propagating the changes to the workspace’s Data Model.

Nam Phan

06/27/2024, 11:59 AM

actually, I deleted these tables, except

src_csv_..merge

table, then GD system recreate remaining tables which have the same data type and it cannot be replaced with the correct type. I've reverted feed.txt file so that the ADS Integrator can load data. if the ADS Integrator can load data from CSV to GD system, I think I should create new columns to correct data type.

Nam Phan

06/27/2024, 12:36 PM

Executable=main.rb Error during evaluation of Ruby in /mnt/execution/lib/gdc_etl_ads_integrator/composite/tree/task.rb at line 90: (SqlError) Java:JavaSqlSQLException [Vertica][VJDBC](2035) ERROR: COPY: Input record 1 has been rejected (Invalid value in column 8 for type Boolean - '2022-06-03 141033' len 19); [Vertica][VJDBC](2035) ERROR: COPY: Input record 1 has been rejected (Invalid value in column 8 for type Boolean - '2022-06-03 141033' len 19)

Nam Phan

06/27/2024, 12:37 PM

the system throws this error and I don't know how to fix them. Could you please take a look?

👀 1

Nam Phan

06/27/2024, 12:48 PM

fyi, I've edited csv file and hope it work. The GD system seems always try to reload all data where it was failed 😂

Nam Phan

06/27/2024, 12:48 PM

it doesn't load new csv files in the latest manifest file

Francisco Antunes

06/27/2024, 12:48 PM

Hi Nam, have you run the CSV Downloader again with the new feed.txt file? The ADS Integrator works based on the output of the CSV Downloader, which is in the src table. If you haven’t deleted it nor run the Downloader, changing the feed file will indeed cause the integrator to fail with the error you posted - it’s because it’s expecting a data type that isn’t reflected in the files. If it continues to fail, I would recommend deleting all the files created by the CSV Downloader (batches, metadata, cache), and then processing it all once more. Again, make sure to fix your feed file and source CSVs before running it, to ensure the ETL will be processing updated data.

Nam Phan

06/27/2024, 12:49 PM

yes, I've rerun everything and from my observation, the GD system will reload data from where it was failed

Nam Phan

06/27/2024, 12:49 PM

so I need to correct old csv files in the S3 where GD system loads

Nam Phan

06/27/2024, 12:50 PM

it passed 1 csv file. So I correct remaining files and hope it work

Francisco Antunes

06/27/2024, 12:51 PM

OK! That sounds good. Let us know if you run into any troubles and we’ll be happy to help! 🙂

Nam Phan

06/27/2024, 12:51 PM

thanks, I'll update to you asap. thanks for your help

7 Views

Open in Slack

Previous Next