Mapping Validation Failed column "cp__id" doesn't exist when it does actually exist.

Question

The csv in question after downloading form S3

The logs:

2021-04-02T22:16:13.640+0200 [INFO]: Data distribution worker started2021-04-02T22:16:13.640+0200 [INFO]: Request id: "data_load_uVsUQjWYuv_YTqtNkRU4v:DR7f6UuIF6vgQhKZ:otffmYmekgTqXoYn"2021-04-02T22:16:13.640+0200 [INFO]: Data source: "604fed9d72e8480930e9fe25"2021-04-02T22:16:13.640+0200 [INFO]: Additional parameters: {GDC_DATALOAD_DATASETS=[{"dataset":"dataset.testdata","uploadMode":"FULL"}], PROCESS_ID=1a159b0a-9204-42e1-bb2f-abb07aed7b23}2021-04-02T22:16:13.640+0200 [INFO]: Synchronization mode: selected datasets (default)2021-04-02T22:16:13.943+0200 [INFO]: Synchronized datasets: [dataset.testdata]2021-04-02T22:16:14.057+0200 [INFO]: ====================== Data distribution scope ======================2021-04-02T22:16:14.057+0200 [INFO]: Project="i9kr8f8fd7802orlklcvvhlsr74l057i"; datasets=[{dataset.testdata, loadDataFrom=2021-02-10T02:50:27}]2021-04-02T22:16:14.057+0200 [INFO]: ====================== End of Data distribution scope ======================2021-04-02T22:16:14.057+0200 [INFO]: ====================== Downloading and integrating data ======================2021-04-02T22:16:19.088+0200 [INFO]: ====================== Scanning data files in tfappsheetsbucket/sheets/======================2021-04-02T22:16:19.105+0200 [INFO]: dataset: dataset.testdata; latest Last Load Timestamp:[2021-02-10T02:50:27]; new files: testdata_20210315163115_full.csv .2021-04-02T22:16:19.105+0200 [INFO]: ====================== End of scanning data files ======================2021-04-02T22:16:19.109+0200 [INFO]: ====================== Mapping validation ======================2021-04-02T22:16:19.109+0200 [INFO]: dataset: dataset.testdata, Messages:["The CSV file is missing column(s): cp__id."] 2021-04-02T22:16:19.109+0200 [INFO]: ====================== End of Mapping validation ======================2021-04-02T22:16:19.110+0200 [ERROR]: Fail to load projects "[i9kr8f8fd7802orlklcvvhlsr74l057i]". Reason: Failed mapping validation2021-04-02T22:16:19.111+0200 [INFO]: ====================== End of downloading and integrating data ======================2021-04-02T22:16:19.114+0200 [ERROR]: Data distribution worker failed. Reason: All projects failed to load.

So basically, I have a csv in S3, it has just two columns but I’m getting this error. It doesn’t make sense to me so I figured I’d create a question. I’m just trying to understand how we have to arrange the columns so it loads properly, but yeah this seemed weird so I figured I’d ask here.

icon

Best answer by Daniela 2 April 2021, 23:20

View original

Daniela · Accepted Answer

Hi Brian,For S3 is important to make sure that the the CSV importedand used in the creation of the LDM, matches the cvs’s of the new loads. I find this tutorial very explanatory for the steps to follow for S3:https://www.gooddata.com/blog/easiest-way-create-date-insights-multiple-csv-files-gooddata-and-its-modeler-s3-distributionCan you double-check that setup of the LDM and that the datasets have the same columns? Can you send ascreenshotof the LDM, specially for dataset.testdata ?

Daniela · Answer

I tried the same settings:And I couldn’t replicate the error:2021-04-22T17:02:49.698+0200 [INFO]: Dataset id: "dataset.testdata"2021-04-22T17:02:49.698+0200 [INFO]:  "cp__id" -> "label.testdata.id" {"LABEL"}["VARCHAR(128)"] 2021-04-22T17:02:49.698+0200 [INFO]:  "a__name" -> "label.testdata.name" {"LABEL"}["VARCHAR(128)"] 2021-04-22T17:02:49.698+0200 [INFO]: Can you double check that the .csv testdata_20210315163115_full.csv has cp__id? or a valid ID?

Mapping Validation Failed column "cp__id" doesn't exist when it does actually exist.

3 replies

Reply

Reply

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded