Hi all, I have a question about the oder when `ded...
# gooddata-platform
k
Hi all, I have a question about the oder when
deduplicating multiple rows
have the same
id
in csv file during running data pipeline process. With below example, what is the row after deduplicating process?
id, value
1, 100
1, 200
Thank you for your help so much.
b
Hi Khoa, provided that the id is your primary key, if you would try to load such data to GoodData platform, only the last line would get uploaded (with value 200).
k
Hi @Boris, thank you for your answer so much. I also assumed with this theory but I met the issue that GoodData seemed to load the first row, not the last.
b
Hi @Khoa Nguyen, it might depend on the way you are loading the data. In general, it should be the last line that sticks, but it’s possible that the order is changed. But I would not recommend to rely on the automatic deduplication as you might get some unexpected results. Could you please describe your use case and why it’s not possible to deduplicate on the database/data source level?
k
Thanks @Boris, I don't know what happen here because it's working in the expected way so far. About the deduplicating on the source level, we are implementing this feature now.