Solved

CSV missing values

  • 19 August 2021
  • 2 replies
  • 293 views

Hi, I am new to GoodData and trying to get started, but having an issue I can’t get around.

I am uploading a csv, created using a jupyter notebook. The file is 15.5mb and has 56k rows. Everything seems fine when I upload it, but when I try to create my first insight, the data is different than in the file.

I have a measure called `timeSpent` . When I sum `timeSpent`  the number in GoodData is significantly lower than it is in the file.

  • sum of timeSpent in GoodData - 1,849.10
  • sum of timeSpent in the file - 19,670,743.42

How can I check what’s wrong? 

icon

Best answer by Boris 19 August 2021, 09:58

View original

2 replies

Userlevel 2

Hi Ron,

We would need to see your data to exactly pinpoint the reason for this.

However, what you are describing sounds like an incorrectly selected primary key (or compound key). Primary key should be a unique identifier of each row in your data, otherwise the rows won’t get loaded (or get overwritten).

 

Consider following example data:

Group ID Name Value
A John 10
A Jack 100
B Peter 1000
B Paul 10000

 

If primary key would be Group ID, only 2 rows would get loaded because it’s repeating. But you can also select a compound key, which would be combination of Group ID and Name. In that case the key is unique for each line and all rows get loaded.

 

I hope this helps.

 

All the best,

Boris from GoodData Technical Support team.

Hi Boris,

Yes, I think this explains it. I didn’t realize primary key needs to be unique.

I believe now it’ll work well.

Thank you!

Ron

Reply