Solved

GoodData.CN using BigQuery as data source

  • 11 September 2021
  • 12 replies
  • 118 views

  • Known Participant
  • 15 replies

Hi Team,

We’re on the final steps of setting up GoodData.CN, using BigQuery as our data source.

Kindly help us on following :

  1. I have built the gooddata cn extra drivers docker image along with service_account.json  file included on db-drivers on and pushed our private custom ECR as <acc_id>.dkr.ecr.<region>.amazonaws.com/gooddata-cn-extra-drivers:latest as specifed as mentioned in this document link  1a. Need a clarification on  setp 4 :  --set sqlExecutor.extraDriversInitContainer=<custom-image-with-drivers>. <custom-image-with-drivers> means gooddata-cn-extra-drivers image URI string ?. OR 1b.  Example Directly update the entry for sqlExecutor.extraDriversInitContainer in the helm chart options with the custom image. Need an example yaml reference snippet for customized-values-gooddata-cn.yaml how to include custom drivers .
  2. Data Source Details after base64 -w0 service_account.json next steps to proceed further to test connection 
  3. Registered Data Source  Please provide example Registered Data Source example for GBQ Dataset

 

steps recommended for existing GDCN 1.3 installation to make it GBQ make it work/sync

JDBC Drivers Info : SimbaJDBCDriverforGoogleBigQuery42_1.2.19.1023.zip

Thanks and Regards,
Ashok

icon

Best answer by jacek 13 September 2021, 20:05

Ashok,

sorry for confusing you. This is known issue, which will be fixed in the next release.

Basically dataSource/test API is broken in a way that it does not pass “token” property properly.

Please, skip using test API and try to register the data source.

 

View original

12 replies

Userlevel 1

Ad 1 example from values.yaml:
 

sqlExecutor:    
  extraDriversInitContainer: xxxxxxx.dkr.ecr.eu-central-1.amazonaws.com/nas/extra-drivers:latest

 

After you execute `base64 -w0 service_account.json`, here is an example of how to test data source definitions.

It is described in the DOC.

curl $ENDPOINT/api/actions/dataSource/test \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -H "Authorization: Bearer $GDC_API_TOKEN" \
  -d '{
      "type": "BIGQUERY",
      "url": "jdbc:bigquery://www.googleapis.com/bigquery/v2:443;OAuthType=0;ProjectId=<projectId>;",
      "schema": "bq_dataset",
      "token": "<base64 value calculated from service account JSON>"
  }' \
  -X POST | jq .

And here is how to register the data source:

curl -H "Authorization: Bearer $GDC_API_TOKEN" -s -H "Content-Type: application/vnd.gooddata.api+json" -X POST \
-d '{
"data": {
"attributes": {
"name": "Test",
"type": "BIGQUERY",
"url": "jdbc:bigquery://www.googleapis.com/bigquery/v2:443;OAuthType=0;ProjectId=<projectId>;",
"schema": "<bq dataset>>",
"token": "<base64 value calculated from service account JSON>"
},
"id": "test",
"type": "dataSource"
}
}' \  
$ENDPOINT//api/entities/dataSources

 

Userlevel 1

Regarding the extra-drivers image, if I understand the screenshot correctly, you injected driver files incorrectly.

Here is how it should look like:

  • Dockefile:
    ADD BIGQUERY/SimbaJDBCDriverforGoogleBigQuery42_1.2.19.1023/ /data/BIGQUERY/
  • ls BIGQUERY/SimbaJDBCDriverforGoogleBigQuery42_1.2.19.1023/
    should return JAR files including e.g. GoogleBigQueryJDBC42.jar

Hi @jacek  Thanks for the help :)

 

This is how it looks now inside from the container 

 

Regards,
Ashok

Hi @jacek   after making above changes GDCN redeployment is need! or  helm upgrade is enough ! I am unable to sync

 

CURL Request

 

 

Thanks and Regards,
Ashok

Userlevel 1

Hi Ashok,

have you tried to execute the test API before registering data source?

curl $ENDPOINT/api/actions/dataSource/test \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer $GDC_API_TOKEN" \
-d '{
"type": "BIGQUERY",
"url": "jdbc:bigquery://www.googleapis.com/bigquery/v2:443;OAuthType=0;ProjectId=<projectId>;",
"schema": "bq_dataset",
"token": "<base64 value calculated from service account JSON>"
}' \
-X POST | jq .

What is the result?

Anyway, please, collect logs from sql-executor PODs representing the failing request and share it here.

Hi @jacek  

Step1 : I have exec the following and returned no errors on this request 

 

Step2:  I have exec the following returned 500


Note: I have modified the above provided curl request as following mainly due to syntax/typo  

curl -H "Authorization: Bearer dummytokendwdadas=" \
  -H "Content-Type: application/vnd.gooddata.api+json" \
  -H "Accept: application/vnd.gooddata.api+json" \
  -X POST \
  -d '
{
  "data": {
    "attributes": {
      "name": "Test",
      "type": "BIGQUERY",
      "url": "jdbc:bigquery://www.googleapis.com/bigquery/v2:443;ProjectId=customer-285602;OAuthType=0",
      "schema": "enterprise_244",
      "token": "ewogICAgInR5cGUiOiAic2Vydmdummybae64encME"
    },
    "id": "test",
    "type": "dataSource"
}
}
  ' https://gdk8sstager5.factoreal.info/api/entities/dataSources

Tried to initate the scan form UI but failed with 500error! 

 

 

 

PFA for pod logs and example curl requests help me if anything  am I missing! 

 

Thanks and Regards,
Ashok

Userlevel 1

Please, try to specify JDBC URL with https prefix like this:
 

jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;

Thanks for the response @jacek .

I have tried following but it did’nt worked!!
{

  "error": "Could not initialize connection to data source; caused by java.sql.SQLException: [JDBC Driver]no JSON input found; caused by java.lang.IllegalArgumentException: no JSON input found",

  "successful": false

}

root@ip-10-2-20-212:/home/ubuntu/r5/gbq/post# curl https://gdk8sstager5.factoreal.info/api/actions/dataSource/test \
> -H "Content-Type: application/json" \
> -H "Accept: application/json" \
> -H "Authorization: Bearer YWRtaW46Ym9vdummeyberrertoken=" \
> -d '{
> "type": "BIGQUERY",
> "url": "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=0;ProjectId=customer-285602;",
> "schema": "enterprise_244",
> "token": "ewogICAgInR5cGUiOiAic2Vydummybase64encoded"
> }' \
> -X POST | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3572 100 211 100 3361 73 1167 0:00:02 0:00:02 --:--:-- 1239
{
"error": "Could not initialize connection to data source; caused by java.sql.SQLException: [JDBC Driver]no JSON input found; caused by java.lang.IllegalArgumentException: no JSON input found",
"successful": false
}
root@ip-10-2-20-212:/home/ubuntu/r5/gbq/post# ls
root@ip-10-2-20-212:/home/ubuntu/r5/gbq/post# curl -H "Authorization: Bearer YWRtaW46Ym9vdummeyberrertoken=" \
> -H "Content-Type: application/vnd.gooddata.api+json" \
> -H "Accept: application/vnd.gooddata.api+json" \
> -X PUT \
> -d '
> {
> "data": {
> "attributes": {
> "name": "Test",
> "type": "BIGQUERY",
> "url": "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=0;ProjectId=customer-285602;",
> "schema": "enterprise_244",
> "token": "ewogICAgInR5cGUiOiAic2Vydummybase64encoded"
> },
> "id": "test",
> "type": "dataSource"
> }
> }
> ' https://gdk8sstager5.factoreal.info/api/entities/dataSources/test
{"data":{"attributes":{"name":"Test","type":"BIGQUERY","url":"jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=0;ProjectId=customer-285602;","schema":"enterprise_244"},"id":"test","type":"dataSource"},"links":{"self":"https://gdk8sstager5.factoreal.info/api/entities/dataSources/test"}}root@ip-10-2-20-212:/home/ubuntu/r5/gbq/post#
root@ip-10-2-20-212:/home/ubuntu/r5/gbq/post#
root@ip-10-2-20-212:/home/ubuntu/r5/gbq/post#
root@ip-10-2-20-212:/home/ubuntu/r5/gbq/post#
root@ip-10-2-20-212:/home/ubuntu/r5/gbq/post#
root@ip-10-2-20-212:/home/ubuntu/r5/gbq/post# curl https://gdk8sstager5.factoreal.info/api/actions/dataSource/test \
> -H "Content-Type: application/json" \
> -H "Accept: application/json" \
> -H "Authorization: Bearer YWRtaW46Ym9vdummeyberrertoken=" \
> -d '{
> "type": "BIGQUERY",
> "url": "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=0;ProjectId=customer-285602;",
> "schema": "enterprise_244",
> "token": "ewogICAgInR5cGUiOiAic2Vydummybase64encoded"
> }' \
> -X POST | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3572 100 211 100 3361 161 2571 0:00:01 0:00:01 --:--:-- 2730
{
"error": "Could not initialize connection to data source; caused by java.sql.SQLException: [JDBC Driver]no JSON input found; caused by java.lang.IllegalArgumentException: no JSON input found",
"successful": false
}
root@ip-10-2-20-212:/home/ubuntu/r5/gbq/post#

PFA for pod logs looks intersting..!

 

Thanks and Regards,
Ashok

Userlevel 1

Ashok,

sorry for confusing you. This is known issue, which will be fixed in the next release.

Basically dataSource/test API is broken in a way that it does not pass “token” property properly.

Please, skip using test API and try to register the data source.

 

Thanks for @jacek  i have proceed with Data source scanning steps encountered PK error any pointers would be helpfull 

 

If we set the PK on above Datasets mentioned on error will it affect our  orignal source GBQ DataSet aka PDM ? 

 

Thanks and Regards,
Ashok

Userlevel 1

Hi Ashok,

glad to hear that your GD.CN can finally connect to the data source!

It is mandatory that every dataset has primary key in logical data model (LDM). It can be generated from PK in tables, but BigQuery does not support it unfortunately. 

If you set PKs in LDM, nothing is changed in the PDM or in the data source. PK in LDM is just a hint for our engine, what is the granularity of the dataset.

WARNING: PK must consist of single LDM attribute. In the future we plan to support multi-attribute PKs

If you would like the Scan generating PKs automatically, you have to utilize the naming convention we provide and then set the corresponding prefix (Primary key prefix) in Scan dialog in “ADVANCED SETTINGS”.

Regards

  Jacek
 

Thanks @jacek for all the support really appreciate.

Reply