Hi team, When I load CSV file in the CSV Downloade...
# gooddata-platform
n
Hi team, When I load CSV file in the CSV Downloader (LCM pipeline), I made a mistake on type of column. And the GoodData system created columns that I cannot change their types with current credentials. But I could delete
src_csv_...merge
table. Is it safe if I delete these tables which have prefixes
src_, stg_, ls_
, then GoodData system will re-create it with correct type in the feed.txt file?
f
Hi Nam, a quick question for you: are you having the CSV Downloader brick automatically create the feed.txt file based on what’s on the CSVs, or did you manually create the file? Do you have access to the Feed file? If so, a simpler solution would be to edit it, changing the Data Type of the wrong columns. I’d recommend updating the version of the line representing the column, as well, to keep track of this change. As explained in the Feed File article, if you are updating the feed.txt file it is very important not to change the field and order fields after the first run, to avoid any issues with data inconsistency. With the updated feed file (and corresponding correction on your CSVs, if necessary), the CSV downloader should generate a new set of
src_
data, which will then be used by the other bricks in the Data Pipeline, thus fixing the error. If you just delete the tables without correcting the feed file, they would just be re-created in the same way when you run the brick again, as the brick can now find a file in the specified path. You could remove the feed file instead, and have the CSV downloader recreate it from scratch (based on the corrected CSV files, of course); In this case, you could delete the other files as well, but I think that wouldn’t be necessary (although it shouldn’t cause problems, if you make sure to run the pipeline from the bottom, to generate the files all over again).
n
Thanks for your explanation, @Francisco Antunes, I have access feed.txt file and change type without deleting
src_
table. From you explanation, it's safe to delete
src_, stg_, ls_
tables/views, right?
when I try to drop table, it throws error
[Vertica][VJDBC]Detail: Cannot drop Table src_csv_coaching_activity_fact_merge because other objects depend on it
and I don't know how to find others depend on this table. can you please help?
f
Hi Nam, I discussed it internally and in this case I’d recommend not deleting any files manually. Instead, fix the feed.txt file, and then run the entire Data Pipeline from the bottom. I would recommend performing a Full Load, to ensure a full batch of corrected data. The Data Pipeline bricks themselves will generate new
src, stg, ls
tables for you with the updated data types, ultimately propagating the changes to the workspace’s Data Model.
n
actually, I deleted these tables, except
src_csv_..merge
table, then GD system recreate remaining tables which have the same data type and it cannot be replaced with the correct type. I've reverted feed.txt file so that the ADS Integrator can load data. if the ADS Integrator can load data from CSV to GD system, I think I should create new columns to correct data type.
Executable=main.rb Error during evaluation of Ruby in /mnt/execution/lib/gdc_etl_ads_integrator/composite/tree/task.rb at line 90: (SqlError) Java:JavaSqlSQLException [Vertica][VJDBC](2035) ERROR: COPY: Input record 1 has been rejected (Invalid value in column 8 for type Boolean - '2022-06-03 141033' len 19); [Vertica][VJDBC](2035) ERROR: COPY: Input record 1 has been rejected (Invalid value in column 8 for type Boolean - '2022-06-03 141033' len 19)
the system throws this error and I don't know how to fix them. Could you please take a look?
👀 1
fyi, I've edited csv file and hope it work. The GD system seems always try to reload all data where it was failed 😂
it doesn't load new csv files in the latest manifest file
f
Hi Nam, have you run the CSV Downloader again with the new feed.txt file? The ADS Integrator works based on the output of the CSV Downloader, which is in the src table. If you haven’t deleted it nor run the Downloader, changing the feed file will indeed cause the integrator to fail with the error you posted - it’s because it’s expecting a data type that isn’t reflected in the files. If it continues to fail, I would recommend deleting all the files created by the CSV Downloader (batches, metadata, cache), and then processing it all once more. Again, make sure to fix your feed file and source CSVs before running it, to ensure the ETL will be processing updated data.
n
yes, I've rerun everything and from my observation, the GD system will reload data from where it was failed
so I need to correct old csv files in the S3 where GD system loads
it passed 1 csv file. So I correct remaining files and hope it work
f
OK! That sounds good. Let us know if you run into any troubles and we’ll be happy to help! 🙂
n
thanks, I'll update to you asap. thanks for your help