Hi Moises,
Thanks for the response. However, I am not sure you fully got my question. You mentioned "you can verify if the data for "column_id" is really unique", but my problem is there is no "column_id" in the dataset. For that matter, I think the error message "Duplicated value "151" in column "id".'" is saying there is a field named "id", not named "column_id", but I don't see a field called just "id" either.
I am not sure if you can see the log so I am copying it here. You can review the "Output Stage -> LDM mapping" that there are "product_id", "customer_id", etc., but not a standalone "id". Where does it come from? What database field is mapped to it?
2024-06-27T231644.360+0200 [INFO]: Data distribution worker started
2024-06-27T231644.360+0200 [INFO]: Request id: "data_load_TPpwozezQ4_u3k495HbE4t9xIMjuj8KI4zyuuP4iEAST8phRsu2Bh"
2024-06-27T231644.360+0200 [INFO]: Data source: "667db1163c1e5549d812f942"
2024-06-27T231644.360+0200 [INFO]: Additional parameters: {GDC_DE_SYNCHRONIZE_ALL=true, PROCESS_ID=c2ba7b6c-631c-43cf-a9ab-d8b94e74d2b2}
2024-06-27T231644.360+0200 [INFO]: Synchronization mode: all mapped datasets (default)
2024-06-27T231648.703+0200 [INFO]: Synchronized datasets: [dataset.products, dataset.campaign_channels, dataset.customers, dataset.order_lines, dataset.campaigns]
2024-06-27T231648.703+0200 [INFO]:
====================== Mapping Validation ======================
Mapping Validation is OK
====================== End of Mapping Validation ======================
2024-06-27T231648.703+0200 [INFO]:
====================== Output Stage -> LDM mapping ======================
Output Stage table/view: campaign_channels (full, shared) -> Dataset: dataset.campaign_channels
category[VARCHAR(128)] -> label.campaign_channels.category{LABEL}[VARCHAR(128)]
campaign_channel_id[VARCHAR(128)] -> label.campaign_channels.campaign_channel_id{LABEL}[VARCHAR(128)]
campaign_id[INT4] -> dataset.campaigns{REFERENCE}[INT]
spend[NUMERIC(15,2)] -> fact.campaign_channels.spend{FACT}[DECIMAL(15,2)]
type[VARCHAR(128)] -> label.campaign_channels.type{LABEL}[VARCHAR(128)]
budget[NUMERIC(15,2)] -> fact.campaign_channels.budget{FACT}[DECIMAL(15,2)]
Output Stage table/view: campaigns (full, shared) -> Dataset: dataset.campaigns
campaign_id[INT4] -> label.campaigns.campaign_id{LABEL}[INT]
campaign_name[VARCHAR(128)] -> label.campaigns.campaign_name{LABEL}[VARCHAR(128)]
Output Stage table/view: customers (full, shared) -> Dataset: dataset.customers
geo__state__location[VARCHAR(64)] -> label.customers.geo_state_location{LABEL}[VARCHAR(64)]
customer_name[VARCHAR(128)] -> label.customers.customer_name{LABEL}[VARCHAR(128)]
customer_id[INT4] -> label.customers.customer_id{LABEL}[INT]
state[VARCHAR(64)] -> label.customers.state{LABEL}[VARCHAR(64)]
region[VARCHAR(64)] -> label.customers.region{LABEL}[VARCHAR(64)]
Output Stage table/view: order_lines (full, shared) -> Dataset: dataset.order_lines
campaign_id[INT4] -> dataset.campaigns{REFERENCE}[INT]
product_id[INT4] -> dataset.products{REFERENCE}[INT]
wdf__region[VARCHAR(128)] -> label.order_lines.wdf_region{LABEL}[VARCHAR(128)]
wdf__state[VARCHAR(64)] -> label.order_lines.wdf_state{LABEL}[VARCHAR(64)]
order_line_id[VARCHAR(128)] -> label.order_lines.order_line_id{LABEL}[VARCHAR(128)]
customer_id[INT4] -> dataset.customers{REFERENCE}[INT]
order_id[VARCHAR(128)] -> label.order_lines.order_id{LABEL}[VARCHAR(128)]
order_status[VARCHAR(128)] -> label.order_lines.order_status{LABEL}[VARCHAR(128)]
date[DATE] -> date{DATE}[DATE]
quantity[NUMERIC(15,2)] -> fact.order_lines.quantity{FACT}[DECIMAL(15,2)]
price[NUMERIC(15,2)] -> fact.order_lines.price{FACT}[DECIMAL(15,2)]
Output Stage table/view: products (full, shared) -> Dataset: dataset.products
product_name[VARCHAR(128)] -> label.products.product_name{LABEL}[VARCHAR(128)]
product_id[INT4] -> label.products.product_id{LABEL}[INT]
category[VARCHAR(128)] -> label.products.category{LABEL}[VARCHAR(128)]
====================== End of Output Stage -> LDM mapping ======================
2024-06-27T231648.713+0200 [INFO]:
====================== Data distribution scope ======================
2024-06-27T231648.713+0200 [INFO]: Project="m8ify7ik9gkfsgn14798hdn2ernba06v"; datasets=[{dataset.products, full}, {dataset.campaign_channels, full}, {dataset.customers, full}, {dataset.order_lines, full}, {dataset.campaigns, full}]
2024-06-27T231648.713+0200 [INFO]:
====================== End of Data distribution scope ======================
2024-06-27T231648.713+0200 [INFO]:
====================== Downloading and integrating data ======================
2024-06-27T231653.741+0200 [INFO]: Data for project="m8ify7ik9gkfsgn14798hdn2ernba06v" was downloaded. Datasets: [{dataset.campaign_channels, upsert, rows=330, size=17743 bytes}, {dataset.campaigns, upsert, rows=154, size=4084 bytes}, {dataset.customers, upsert, rows=2000, size=115912 bytes}, {dataset.order_lines, upsert, rows=11978, size=1056172 bytes}, {dataset.products, upsert, rows=36, size=781 bytes}]
2024-06-27T231658.753+0200 [WARN]: Project "m8ify7ik9gkfsgn14798hdn2ernba06v" was not integrated. Reason: Upload to dataset dataset.products failed with 'An error occured during loading of csv files: SQLSTATE_UNIQUE_VIOLATION. Details: Duplicated value "3" in column "id".'. Upload to dataset dataset.campaign_channels failed with 'An error occured during loading of csv files: SQLSTATE_UNIQUE_VIOLATION. Details: Duplicated value "151" in column "id".'. Upload to dataset dataset.customers failed with 'An error occured during loading of csv files: SQLSTATE_UNIQUE_VIOLATION. Details: Duplicated value "915" in column "id".'. Upload to dataset dataset.order_lines failed with 'An error occured during loading of csv files: SQLSTATE_UNIQUE_VIOLATION. Details: Duplicated value "4590" in column "id".'. Upload to dataset dataset.campaigns failed with 'An error occured during loading of csv files: SQLSTATE_UNIQUE_VIOLATION. Details: Duplicated value "172" in column "id".'.
2024-06-27T231708.760+0200 [INFO]:
====================== End of downloading and integrating data ======================
2024-06-27T231708.760+0200 [ERROR]: Data distribution worker failed. Reason: All projects failed to load.
Automated Data Distribution v2 25sError found Reload X
RequestID WatchDog DEBUG INFO WARN ERROR
phases parsing not available
Another question. I did not explicitly set the "output stage", so is there a way I can download what the process has downloaded in the following log to examine the "downloaded data"?
2024-06-27T231653.741+0200 [INFO]: Data for project="m8ify7ik9gkfsgn14798hdn2ernba06v" was downloaded. Datasets: [{dataset.campaign_channels, upsert, rows=330, size=17743 bytes}, {dataset.campaigns, upsert, rows=154, size=4084 bytes}, {dataset.customers, upsert, rows=2000, size=115912 bytes}, {dataset.order_lines, upsert, rows=11978, size=1056172 bytes}, {dataset.products, upsert, rows=36, size=781 bytes}]
Thanks.