Martin Váňa
05/08/2023, 9:27 AM2023-05-05T09:09:37,307+0000 [pulsar-io-18-7] ERROR org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - [<persistent://public/default/compute.calcique.DLQ> / Calcique listener-dead-letter] Error reading entries at 263:2 : Cursor was already closed, Read Type Normal - Retrying to read in 58.336 seconds
Later on more errors appeared, repeatedly manifesting:
ts="2023-05-05 14:46:43.043" level=ERROR msg="gRPC server call" logger=com.gooddata.tiger.metadata.grpc.MetadataStoreGrpcService thread=DefaultDispatcher-worker-21 action=grpcServerCall orgId=<undefined> spanId=f53219cd692f24ae traceId=d395d0b2d670736c userId=<undefined> exc="org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30304ms.
ts="2023-05-05 14:49:14.024" level=ERROR msg="Internal Server Error" logger=com.gooddata.tiger.web.exception.ProblemExceptionHandling thread=grpc-default-executor-8399 orgId=default spanId=11fde5c0b85fb8cc traceId=11fde5c0b85fb8cc userId=33e2c1a3-f34b-4855-b187-3739f04a43db exc="errorType=com.gooddata.tiger.grpc.error.GrpcPropagatedServerException, message=org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30811ms.,<no detail>
2023-05-05T14:49:57,221+0000 [pulsar-io-18-3] ERROR org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - [<persistent://public/default/compute.calcique.DLQ> / Calcique listener-dead-letter] Error reading entries at 263:2 : Cursor was already closed, Read Type Normal - Retrying to read in 58.369 seconds
That is where we have started to observe 500s responses. A few minutes later we started to receive
2023-05-05T14:51:13,372+0000 [pulsar-web-48-8] ERROR org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] Topic <persistent://public/default/data-source.change> already exists
2023-05-05 14:51:25.374 UTC [1394] mduser@md ERROR: type "dashboard_permissions" already exists
After some time we have tried to restart the container but that made things even worse, as the container cannot boot up, reporting
2023-05-08 09:02:44.820 UTC [13622] PANIC: could not locate a valid checkpoint record
2023-05-08 09:02:45.926 UTC [13645] postgres@postgres FATAL: the database system is starting up
Each time we attempt to start a container, it allocates about 20GB of new space on shared EFS. The container starts, however it is not capable of handling requests and gets killed by LoadBalancer. Our deployment is on AWS, using ECS (Fargate), EFS and RDS (postgre).Jan Rehanek
05/09/2023, 8:35 AMTomáš Gajdoš
05/09/2023, 8:45 AMapi/v1/entities/workspaces/{workspaceId}/analyticalDashboards
. I don’t think that should be the cause of the issue, but that’s just what we were doing that day.Jan Rehanek
05/09/2023, 8:54 AMJan Rehanek
05/09/2023, 8:59 AMMartin Váňa
05/09/2023, 9:03 AMunregistered_redirect_url
, I can send you the link in DM if you would like. However it seems to be a different error.Jan Rehanek
05/09/2023, 9:09 AMMartin Váňa
05/09/2023, 9:10 AMMartin Váňa
05/09/2023, 9:19 AMJan Rehanek
05/09/2023, 9:21 AMMartin Váňa
05/09/2023, 9:23 AMJan Rehanek
05/09/2023, 2:05 PMMartin Váňa
05/09/2023, 2:14 PMMartin Váňa
05/09/2023, 2:16 PMMartin Váňa
05/09/2023, 3:32 PMMartin Váňa
05/09/2023, 6:44 PMunregistered_redirect_url
though.Martin Váňa
05/09/2023, 6:45 PMMartin Váňa
05/10/2023, 6:52 AMauth_request
table in /data/dex.db
. Redirect Url is on of the columns there so I suppose that is the culprit. I've so far failed to pinpoint what went wrong on the clean bootstrap. I have logged:
Empty volume detected, creating data directory
in the current deploymentJan Rehanek
05/10/2023, 7:12 AMMartin Váňa
05/10/2023, 7:15 AMMartin Váňa
05/10/2023, 7:31 AMMartin Váňa
05/10/2023, 7:41 AMJan Rehanek
05/10/2023, 8:07 AMVáclav Slováček
05/10/2023, 8:29 AMMartin Váňa
05/10/2023, 11:26 AM/data
volume is persistent in both cases. I cannot figure, what could go wrong on the boot. I can export the logs if you are interestedRobert Moucha
05/10/2023, 3:08 PM/data
directory. This data contains (among other things) Postgresql data dir and Dex db in sqlite3 format (dex.db).
2. When running directly from docker, it's possible to mount volume into this directory using -v somevolume:/data
so data will survive various container lifecycle events (including stop and delete). This is the only way how to support gooddata-cn-ce upgrades - simply stop old container and start a new one with the data volume mounted.
3. Downgrades are currently not possible - some components perform upgrade of db schema and if you start older image version with such updated volume, it will not work (in most cases). You can copy docker volume data to safe place before running upgrade, to make sure you still have older data copy you may use in case of troubles and start it with previous image version.
4. As far as ECS is concerned - I don't know your exact configuration, but remember the data volume contains databases. Errors you're describing suggest the volume was forcefully detached while the container was running.
5. I don't have in-depth experience with EFS and how it allocates space when used with ECS. But 20GB right after container start looks really suspicious. Empty PG db has less than 100MB and even with big data model the size hardly exceeds 500MB.Tomáš Gajdoš
05/19/2023, 12:30 PM/api/v1/entities/admin/organizations/{organization}
with
"data": {
"id": "default",
"type": "organization",
"attributes": {
"name": "Default Organization",
"hostname": "<our_hostname>"
}
}
Jan Rehanek
05/19/2023, 12:35 PMunregistered_redirect_url
error popping when you’re trying to access the hostname or is it more?Tomáš Gajdoš
05/19/2023, 12:35 PMJan Rehanek
05/19/2023, 12:36 PMJan Rehanek
05/19/2023, 1:40 PM/api/v1/entities/admin/organizations/default
with PUT to some new OIDC provider.
3. Update /api/v1/entities/admin/organizations/default
with PUT that only contains:
{
"data": {
"id": "default",
"type": "organization",
"attributes": {
"name": "Default Organization",
"hostname": "{{custom_hostname}}"
}
}
}
Is that all or am I missing some intermediate step?Tomáš Gajdoš
05/19/2023, 1:42 PMJan Rehanek
05/19/2023, 2:18 PM{
"detail": "Organization hostname cannot be changed",
"status": 400,
"title": "Bad Request",
"traceId": "14b5c6b83d65da75"
}
Jeffrey Craig
05/19/2023, 10:54 PMRobert Moucha
05/20/2023, 4:21 PMGDCN_PUBLIC_URL
environment variable on container start.
Unfortunately, the version 2.3.0 contains bug that generates invalid redirect_uri for dex oauth2 client. This error makes practically impossible to use public url with default port for given protocol (80 for http, 443 for https) 😞
The error was already fixed and will not be present in the next release. Or you may use some recent development build (Apr 12th is the first containing the fix).Jeffrey Craig
05/20/2023, 4:23 PMRobert Moucha
05/20/2023, 4:25 PM<https://whatever.com>
) THEN you're affectedJeffrey Craig
05/20/2023, 4:26 PMRobert Moucha
05/20/2023, 4:26 PM<https://whatever.com>:*\n*/login/oauth2/code/whatever.com
Jeffrey Craig
05/20/2023, 4:26 PMRobert Moucha
05/20/2023, 4:27 PMRobert Moucha
05/20/2023, 10:25 PMJeffrey Craig
05/22/2023, 12:06 PMJeffrey Craig
05/22/2023, 12:11 PMsudo docker run -i -t -p 3000:3000 -p 5432:5432 -v gooddata-dev:/data \
-e GDCN_TOKEN_SECRET=XXXXXXXXXXXX \
-e GDCN_PUBLIC_URL=<https://analytics.novelcx.com> \
-e LICENSE_AND_PRIVACY_POLICY_ACCEPTED=YES \
gooddata/gooddata-cn-ce:dev_20230517.a881988f
Robert Moucha
05/22/2023, 12:13 PMCtrl-C
in container's terminal. This action sends SIGINT signal to supervisor that shuts down the whole application stack:
\============= All services of <http://GoodData.CN|GoodData.CN> are ready =============
127.0.0.1 - - [22/May/2023:11:54:26 +0000] "GET / HTTP/1.1" 200 5639 "-" "curl/7.74.0"
Nginx: ready
s6-rc: info: service nginx successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
172.31.90.40 - - [22/May/2023:11:54:38 +0000] "GET / HTTP/1.1" 200 2587 "-" "ELB-HealthChecker/2.0"
^C
exiting...
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service nginx: stopping
^^^ Note the ^C
in the outputJeffrey Craig
05/22/2023, 12:14 PMJeffrey Craig
05/23/2023, 10:00 PM{
"detail": "Invalid combination of auth properties. Specify either none or user + password or token. Datasource: XXXX, user: <yes>, password: <no>, token: <no>",
"status": 400,
"title": "Bad Request",
"traceId": "7df800d37c2eea44"
}
Jeffrey Craig
05/23/2023, 10:59 PM{
"title": "Unauthorized",
"status": 401,
"detail": "401 UNAUTHORIZED \"Authorization failed for given issuer \"<https://ncx-prod.auth0.com/authorize/>\"\"",
"traceId": "4b70fd5992b55eca"
}
Robert Moucha
05/25/2023, 2:18 PM<https://ncx-prod.auth0.com/authorize/>
oauthIssuerLocation
needs to be set to:
<https://ncx-prod.auth0.com/>
in oidc configuration. Do not forget adding the trailing slash, Auth0 requires it.Robert Moucha
05/25/2023, 2:25 PM<https://ncx-prod.auth0.com/>
and authorization endpoint is retrieved automatically from openid-configuration document. So do not append /authorize/
or whatever else to Issuer URL.
See https://www.gooddata.com/developers/cloud-native/doc/2.3/manage-organization/set-up-authentication/#SetUpAuthenticationUsi[…]tIdentityProvider-Auth0 for Auth0-specific comments.Robert Moucha
05/25/2023, 2:29 PMusername
and password
OR `token`(for db like bigquery) to every exported datasource in your Organization layout.