Hi, Iโ€™m running Gooddata in a docker-container and...
# gooddata-cn
m
Hi, Iโ€™m running Gooddata in a docker-container and deployed gooddata-fdw next to it. The system uses a reverse proxy (Traefik, including SSL-certificates) because itโ€™s running on a separate machine and not on localhost. Gooddata itself is working fine, but Iโ€™m having a hard time getting the connection from the FDW to my instance. Here is the relevant snippet from the docker-compse.yml:
Copy code
gooddata:
    image: gooddata/gooddata-cn-ce
    environment:
      LICENSE_AND_PRIVACY_POLICY_ACCEPTED: "YES"
      GDCN_PUBLIC_URL: <https://gooddata.mydomain.com>
    container_name: gooddata
    hostname: gooddata
    ports:
      - 3000
    volumes:
      - bilab_gooddata:/data
    networks:
      net:
        ipv4_address: 172.18.0.3
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.gooddata.entrypoints=websecure"
      - "traefik.http.routers.gooddata.rule=Host(`gooddata.$MY_DOMAIN`)"
      - "traefik.http.routers.gooddata.tls.certresolver=le"
      - "traefik.http.routers.gooddata.tls.domains[0].main=gooddata.$MY_DOMAIN"

  gooddata-fdw:
    build:
      context: <https://github.com/gooddata/gooddata-python-sdk.git#v1.1.0>
      dockerfile: gooddata-fdw/Dockerfile
    ports:
      - "2543:5432"
    volumes:
      - bilab_gooddata_fdw_data:/data
    environment:
      POSTGRES_DB: gooddata
      POSTGRES_USER: gooddata
      POSTGRES_PASSWORD: gooddata
    command: ["postgres", "-c", "shared_preload_libraries=foreign_table_exposer", "-c", "log_statement=all", "-c", "client_min_messages=DEBUG1", "-c", "log_min_messages=DEBUG1"]
    networks:
      net:
        ipv4_address: 172.18.0.6
I can sucessfully connect to the fdw-postgres-database. The server is registered with this statement:
Copy code
CREATE SERVER multicorn_gooddata FOREIGN DATA WRAPPER multicorn
  OPTIONS (
    wrapper 'gooddata_fdw.GoodDataForeignDataWrapper',
    host '<https://goddata.mydomain.com>',
    token 'YWRtaW46Ym9vdHN0cmFwOmFkbWluMTIz'
  );
Gooddata can be accessed and used by the given URL from a browser. Next I try to import existing insights from a existing workspace:
Copy code
CALL import_gooddata(workspace := 'Flights', object_type := 'insights');
This results in an error: In my database console I see the following:
Copy code
Traceback (most recent call last):
	  File "/usr/lib/python3.10/site-packages/gooddata_fdw-1.1.0-py3.10.egg/gooddata_fdw/fdw.py", line 71, in import_schema
	    tables += instance.import_tables()
	  File "/usr/lib/python3.10/site-packages/gooddata_fdw-1.1.0-py3.10.egg/gooddata_fdw/import_workspace.py", line 86, in import_tables
	    catalog = self._sdk.catalog_workspace_content.get_full_catalog(self._workspace)
	  File "/usr/lib/python3.10/site-packages/gooddata_sdk-1.1.0-py3.10.egg/gooddata_sdk/catalog/workspace/content_service.py", line 79, in get_full_catalog
	    attributes = load_all_entities(get_attributes)
	  File "/usr/lib/python3.10/site-packages/gooddata_sdk-1.1.0-py3.10.egg/gooddata_sdk/utils.py", line 88, in load_all_entities
	    result = get_page_func(page=current_page, size=page_size)
	  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api/entities_api.py", line 12248, in get_all_entities_attributes
	    return self.get_all_entities_attributes_endpoint.call_with_http_info(**kwargs)
	  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 880, in call_with_http_info
	    return self.api_client.call_api(
	  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 422, in call_api
	    return self.__call_api(resource_path, method,
	  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 199, in __call_api
	    response_data = self.request(
	  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 448, in request
	    return self.rest_client.GET(url,
	  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/rest.py", line 236, in GET
	    return self.request("GET", url,
	  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/rest.py", line 202, in request
	    r = self.pool_manager.request(method, url,
	  File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/request.py", line 74, in request
	    return self.request_encode_url(
	  File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/request.py", line 96, in request_encode_url
	    return self.urlopen(method, url, **extra_kw)
	  File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/poolmanager.py", line 376, in urlopen
	    response = conn.urlopen(method, u.request_uri, **kw)
	  File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/connectionpool.py", line 815, in urlopen
	    return self.urlopen(
	  File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/connectionpool.py", line 815, in urlopen
	    return self.urlopen(
	  File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/connectionpool.py", line 815, in urlopen
	    return self.urlopen(
	  File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/connectionpool.py", line 787, in urlopen
	    retries = retries.increment(
	  File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/util/retry.py", line 592, in increment
	    raise MaxRetryError(_pool, url, error or ResponseError(cause))
	urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='<http://goddata.mydomain.com|goddata.mydomain.com>', port=443): Max retries exceeded with url: /api/v1/entities/workspaces/Flights/attributes?include=labels&page=0&size=500 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f032ede6bc0>: Failed to establish a new connection: [Errno -2] Name does not resolve'))
	
2022-09-19 15:19:47.428 UTC [65] CONTEXT:  SQL statement "IMPORT FOREIGN SCHEMA "Flights" FROM SERVER "multicorn_gooddata" INTO "Flights" OPTIONS (object_type 'insights', numeric_max_size '18')"
	PL/pgSQL function execute_sql(character varying,boolean) line 6 at EXECUTE
	SQL statement "CALL execute_sql(sql_statement, debug)"
	PL/pgSQL function import_gooddata(character varying,character varying,character varying,integer,character varying,boolean) line 20 at CALL
2022-09-19 15:19:47.428 UTC [65] STATEMENT:  CALL import_gooddata(workspace := 'Flights', object_type := 'insights')
Somehow it is not connecting to the gooddata-instance, which is running fine in the other docker-container. What do I need to change to get the connection working ? Thanks for your help in advance.
j
Copy code
host '<https://goddata.mydomain.com>'
Is this really your intended hostname? ๐Ÿค”
r
Copy code
[Errno -2] Name does not resolve'
You must ensure the public hostname
<http://goddata.mydomain.com|goddata.mydomain.com>
is resolvable to IP address of your traefic proxy from the fdw container.
j
It seems that you have a typo in host in CREATE SERVER command. You have:
Copy code
host '<https://goddata.mydomain.com>',
But in your docker compose is:
Copy code
GDCN_PUBLIC_URL: <https://gooddata.mydomain.com>
r
God Data, we should register trade mark ๐Ÿ˜‰
m
Ok guys, I'm one step further. The typo has been fixed. Thanks for that one ๐Ÿ‘ Now I get an error 404 ?? Since I've configured the token, I'm guessing the permission should be granted...
Copy code
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/gooddata_fdw-1.1.0-py3.10.egg/gooddata_fdw/fdw.py", line 71, in import_schema
    tables += instance.import_tables()
  File "/usr/lib/python3.10/site-packages/gooddata_fdw-1.1.0-py3.10.egg/gooddata_fdw/import_workspace.py", line 86, in import_tables
    catalog = self._sdk.catalog_workspace_content.get_full_catalog(self._workspace)
  File "/usr/lib/python3.10/site-packages/gooddata_sdk-1.1.0-py3.10.egg/gooddata_sdk/catalog/workspace/content_service.py", line 79, in get_full_catalog
    attributes = load_all_entities(get_attributes)
  File "/usr/lib/python3.10/site-packages/gooddata_sdk-1.1.0-py3.10.egg/gooddata_sdk/utils.py", line 89, in load_all_entities
    result = get_page_func(page=current_page, size=page_size)
  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api/entities_api.py", line 12248, in get_all_entities_attributes
    return self.get_all_entities_attributes_endpoint.call_with_http_info(**kwargs)
  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 880, in call_with_http_info
    return self.api_client.call_api(
  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 422, in call_api
    return self.__call_api(resource_path, method,
  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 206, in __call_api
    raise e
  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 199, in __call_api
    response_data = self.request(
  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 448, in request
    return self.rest_client.GET(url,
  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/rest.py", line 236, in GET
    return self.request("GET", url,
  File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/rest.py", line 225, in request
    raise NotFoundException(http_resp=r)
gooddata_metadata_client.exceptions.NotFoundException: Status Code: 404
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Access-Control-Allow-Credentials': 'true', 'Access-Control-Expose-Headers': 'Content-Disposition, Content-Length, Content-Range, Set-Cookie', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate', 'Content-Security-Policy': "default-src 'self' *.<http://wistia.com|wistia.com> *.<http://wistia.net|wistia.net>; script-src 'self' 'unsafe-inline' 'unsafe-eval' *.<http://wistia.com|wistia.com> *.<http://wistia.net|wistia.net> <http://src.litix.io|src.litix.io> <http://matomo.anywhere.gooddata.com|matomo.anywhere.gooddata.com> <http://code.jquery.com|code.jquery.com> <http://unpkg.com|unpkg.com> <http://cdn.jsdelivr.net|cdn.jsdelivr.net> <http://cdnjs.cloudflare.com|cdnjs.cloudflare.com>; img-src 'self' data: blob: *.<http://wistia.com|wistia.com> *.<http://wistia.net|wistia.net> <http://embedwistia-a.akamaihd.net|embedwistia-a.akamaihd.net> <http://privacy-policy.truste.com|privacy-policy.truste.com> <http://www.gooddata.com|www.gooddata.com>; style-src 'self' 'unsafe-inline' <http://fonts.googleapis.com|fonts.googleapis.com> <http://cdn.jsdelivr.net|cdn.jsdelivr.net> <http://fast.fonts.net|fast.fonts.net>; font-src 'self' data: <http://fonts.gstatic.com|fonts.gstatic.com> *.<http://alicdn.com|alicdn.com> *.<http://wistia.com|wistia.com> <http://cdn.jsdelivr.net|cdn.jsdelivr.net> <http://info.gooddata.com|info.gooddata.com>; frame-src 'self'; object-src 'none'; worker-src 'self' blob:; child-src blob:; connect-src 'self' *.<http://tiles.mapbox.com|tiles.mapbox.com> *.<http://mapbox.com|mapbox.com> *.<http://litix.io|litix.io> *.<http://wistia.com|wistia.com> <http://embedwistia-a.akamaihd.net|embedwistia-a.akamaihd.net> <http://matomo.anywhere.gooddata.com|matomo.anywhere.gooddata.com>; media-src 'self' blob: data: *.<http://wistia.com|wistia.com> *.<http://wistia.net|wistia.net> <http://embedwistia-a.akamaihd.net|embedwistia-a.akamaihd.net>", 'Content-Type': 'application/problem+json', 'Date': 'Tue, 20 Sep 2022 09:52:54 GMT', 'Expires': '0', 'Gooddata-Deployment': 'aio', 'Permission-Policy': "geolocation 'none'; midi 'none'; sync-xhr 'none'; microphone 'none'; camera 'none'; magnetometer 'none'; gyroscope 'none'; fullscreen 'none'; payment 'none';", 'Pragma': 'no-cache', 'Server': 'nginx', 'Set-Cookie': 'SPRING_SEC_SECURITY_CONTEXT=; Max-Age=0; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/; Secure; HttpOnly', 'Strict-Transport-Security': 'max-age=31536000 ; includeSubDomains', 'Vary': 'Origin, Access-Control-Request-Method, Access-Control-Request-Headers', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'X-Gdc-Trace-Id': '7a8e48aaea8cebe5', 'X-Xss-Protection': '1; mode=block', 'Transfer-Encoding': 'chunked'})
HTTP response body: {"detail":"The requested endpoint does not exist or you do not have permission to access it.","status":404,"title":"Not Found","traceId":"7a8e48aaea8cebe5"}
Do I need to set a header or something ? I'm just guessing...
j
Could you please try to GoodData container name as host?
Copy code
host '<https://gooddata>',
m
Hi Jan, I've tried it, but as expected it refuses the connection. https is handled by the reverse proxy. raceback (most recent call last): File "/usr/lib/python3.10/site-packages/gooddata_fdw-1.1.0-py3.10.egg/gooddata_fdw/fdw.py", line 71, in import_schema tables += instance.import_tables() File "/usr/lib/python3.10/site-packages/gooddata_fdw-1.1.0-py3.10.egg/gooddata_fdw/import_workspace.py", line 86, in import_tables catalog = self._sdk.catalog_workspace_content.get_full_catalog(self._workspace) File "/usr/lib/python3.10/site-packages/gooddata_sdk-1.1.0-py3.10.egg/gooddata_sdk/catalog/workspace/content_service.py", line 79, in get_full_catalog attributes = load_all_entities(get_attributes) File "/usr/lib/python3.10/site-packages/gooddata_sdk-1.1.0-py3.10.egg/gooddata_sdk/utils.py", line 89, in load_all_entities result = get_page_func(page=current_page, size=page_size) File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api/entities_api.py", line 12248, in get_all_entities_attributes return self.get_all_entities_attributes_endpoint.call_with_http_info(**kwargs) File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 880, in call_with_http_info return self.api_client.call_api( File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 422, in call_api return self.__call_api(resource_path, method, File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 199, in __call_api response_data = self.request( File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/api_client.py", line 448, in request return self.rest_client.GET(url, File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/rest.py", line 236, in GET return self.request("GET", url, File "/usr/lib/python3.10/site-packages/gooddata_metadata_client-1.1.0-py3.10.egg/gooddata_metadata_client/rest.py", line 202, in request r = self.pool_manager.request(method, url, File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/request.py", line 74, in request return self.request_encode_url( File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/request.py", line 96, in request_encode_url return self.urlopen(method, url, **extra_kw) File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/poolmanager.py", line 376, in urlopen response = conn.urlopen(method, u.request_uri, **kw) File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/connectionpool.py", line 815, in urlopen return self.urlopen( File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/connectionpool.py", line 815, in urlopen return self.urlopen( File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/connectionpool.py", line 815, in urlopen return self.urlopen( File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/connectionpool.py", line 787, in urlopen retries = retries.increment( File "/usr/lib/python3.10/site-packages/urllib3-1.26.12-py3.10.egg/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='gooddata', port=443): Max retries exceeded with url: /api/v1/entities/workspaces/Flights/attributes?include=labels&page=0&size=500 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f0564cee800>: Failed to establish a new connection: [Errno 111] Connection refused'))
If I try http://gooddata:3000 as host, I get an error 404 again
r
the Host header must match the GDCN_PUBLIC_URL hostname. So if you set:
Copy code
GDCN_PUBLIC_URL: <https://gooddata.mydomain.com>
on gooddata-cn-ce container and you're able to log in through this URL (that should actually point to your traefik proxy), then it looks more like a different issue.
check your gooddata-cn-ce container logs during the fdw request
What URI it is calling? Are there any related backend exceptions? hint - you can find for traceId reported by the python
also, make sure the
import_gooddata(workspace := 'Flights', object_type := 'insights')
refers to existing workspace_id
m
Ok I've switched the host and the header to the proper name. The following comes up in the gooddata-cn-ce container logs:
Copy code
ts="2022-09-20 11:08:40.126" level=ERROR msg="Not Found" logger=com.gooddata.tiger.web.exception.BaseExceptionHandling thread=http-nio-9007-exec-4 orgId=default spanId=a16173b7e56c6861 traceId=a16173b7e56c6861 userId=admin exc="Operation is not granted workspace: Flights, source: CACHE, objectExist: false"
172.18.0.100 - - [20/Sep/2022:11:08:40 +0000] "GET /api/v1/entities/workspaces/Flights/attributes?include=labels&page=0&size=500 HTTP/1.1" 404 167 "-" "gooddata-python-sdk/1.1.0 gooddata-fdw/1.1.0"
The workspace "Flights" exists in my instance:
Bildschirmfoto 2022-09-20 um 13.11.49.jpg
j
Flights you see in the GoodData UI is a workspace name you need to pass workspace id. When you open the workspace the workspace id is string after the last
/
in the url, so in the following example
8cadde05fc6a412aa41f94bd90a9bc77
is workspace id.
You can create pretty workspace ids using Python package
gooddata_sdk
. Using the following code snippet:
Copy code
from gooddata_sdk import GoodDataSdk, CatalogWorkspace

# <http://GoodData.CN|GoodData.CN> host in the form of uri eg. "<http://localhost:3000>"
host = "<http://localhost:3000>"
# <http://GoodData.CN|GoodData.CN> user token
token = "some_user_token"
sdk = GoodDataSdk.create(host, token)

# Create new workspace entity locally
my_workspace_object = CatalogWorkspace(workspace_id="Flights",
                                     name="Flights")

# Create workspace
sdk.catalog_workspace.create_or_update(workspace=my_workspace_object)
m
Ok, that was the problem. After the successful import I see the necessity for a "pretty" workspace name. ๐Ÿ˜„
Thanks a lot