Mapping Validation Failed column "cp__id" doesn't exist when it does actually exist.

  • 2 April 2021
  • 3 replies

The csv in question after downloading form S3

The logs: 

2021-04-02T22:16:13.640+0200 [INFO]: Data distribution worker started2021-04-02T22:16:13.640+0200 [INFO]: Request id: "data_load_uVsUQjWYuv_YTqtNkRU4v:DR7f6UuIF6vgQhKZ:otffmYmekgTqXoYn"2021-04-02T22:16:13.640+0200 [INFO]: Data source: "604fed9d72e8480930e9fe25"2021-04-02T22:16:13.640+0200 [INFO]: Additional parameters: {GDC_DATALOAD_DATASETS=[{"dataset":"dataset.testdata","uploadMode":"FULL"}], PROCESS_ID=1a159b0a-9204-42e1-bb2f-abb07aed7b23}2021-04-02T22:16:13.640+0200 [INFO]: Synchronization mode: selected datasets (default)2021-04-02T22:16:13.943+0200 [INFO]: Synchronized datasets: [dataset.testdata]2021-04-02T22:16:14.057+0200 [INFO]: ====================== Data distribution scope ======================2021-04-02T22:16:14.057+0200 [INFO]: Project="i9kr8f8fd7802orlklcvvhlsr74l057i"; datasets=[{dataset.testdata, loadDataFrom=2021-02-10T02:50:27}]2021-04-02T22:16:14.057+0200 [INFO]: ====================== End of Data distribution scope ======================2021-04-02T22:16:14.057+0200 [INFO]: ====================== Downloading and integrating data ======================2021-04-02T22:16:19.088+0200 [INFO]: ====================== Scanning data files in tfappsheetsbucket/sheets/======================2021-04-02T22:16:19.105+0200 [INFO]: dataset: dataset.testdata; latest Last Load Timestamp:[2021-02-10T02:50:27]; new files: testdata_20210315163115_full.csv .2021-04-02T22:16:19.105+0200 [INFO]: ====================== End of scanning data files ======================2021-04-02T22:16:19.109+0200 [INFO]: ====================== Mapping validation ======================2021-04-02T22:16:19.109+0200 [INFO]: dataset: dataset.testdata, Messages:["The CSV file is missing column(s): cp__id."] 2021-04-02T22:16:19.109+0200 [INFO]: ====================== End of Mapping validation ======================2021-04-02T22:16:19.110+0200 [ERROR]: Fail to load projects "[i9kr8f8fd7802orlklcvvhlsr74l057i]". Reason: Failed mapping validation2021-04-02T22:16:19.111+0200 [INFO]: ====================== End of downloading and integrating data ======================2021-04-02T22:16:19.114+0200 [ERROR]: Data distribution worker failed. Reason: All projects failed to load.




So basically, I have a csv in S3, it has just two columns but I’m getting this error. It doesn’t make sense to me so I figured I’d create a question. I’m just trying to understand how we have to arrange the columns so it loads properly, but yeah this seemed weird so I figured I’d ask here.





Best answer by Daniela 2 April 2021, 23:20

View original

3 replies

Hi Brian,

For S3 is important to make sure that the the CSV imported and used in the creation of the LDM, matches the cvs’s of the new loads. I find this tutorial very explanatory for the steps to follow for S3:

Can you double-check that setup of the LDM and that the datasets have the same columns? Can you send a screenshot of the LDM, specially for dataset.testdata ?

Thanks for the quick response


That very well could be the problem, as I was figuring out how the csvs should look, and I certainly got it wrong the first time. I havent changed this data model, but I have changed the csvs. I created the data model manually, I didn’t drag and drop a csv. 


Here is a screen of testdata: 


I tried the same settings:


And I couldn’t replicate the error:

2021-04-22T17:02:49.698+0200 [INFO]: Dataset id: "dataset.testdata"2021-04-22T17:02:49.698+0200 [INFO]:  "cp__id" -> "" {"LABEL"}["VARCHAR(128)"] 2021-04-22T17:02:49.698+0200 [INFO]:  "a__name" -> "" {"LABEL"}["VARCHAR(128)"] 2021-04-22T17:02:49.698+0200 [INFO]: 


Can you double check that the .csv testdata_20210315163115_full.csv has cp__id? or a valid ID?