Meet other GoodData developers. Search our knowledge base. Attend live events.

GoodData

Hi all, I have a question about the oder when `deduplicating multiple rows` have the same `id` in csv file during running data pipeline process. With below example, what is the row after deduplicating process?
`id, value`
`1, 100`
`1, 200`
Thank you for your help so much.

Hi Khoa, provided that the id is your primary key, if you would try to load such data to GoodData platform, only the last line would get uploaded (with value 200).

Hi <@U01SN29PXUZ>, thank you for your answer so much. I also assumed with this theory but I met the issue that GoodData seemed to load the first row, not the last.

Hi <@U027UMV9VHA>, it might depend on the way you are loading the data. In general, it should be the last line that sticks, but it’s possible that the order is changed.
But I would not recommend to rely on the automatic deduplication as you might get some unexpected results.
Could you please describe your use case and why it’s not possible to deduplicate on the database/data source level?

Thanks <@U01SN29PXUZ>, I don't know what happen here because it's working in the expected way so far. About the deduplicating on the source level, we are implementing this feature now.