Hi I am installing GDCN on an EKS with external ElastiCache GoodData #gooddata-cn

Hi. I am installing GDCN on an EKS with external E...

Dongfeng Lu

06/20/2024, 10:10 PM

Hi. I am installing GDCN on an EKS with external ElastiCache cluster (with Redis interface) and Aurora RDS (Postgres family) following the instuctions on https://www.gooddata.com/docs/cloud-native/3.7/deploy-and-install/cloud-native/environment/aws/. But I cannot make the following pod running correctly. $ kubectl get pod -n gooddata-cn | grep 0/ gooddata-cn-afm-exec-api-7d46b6f777-tz7bq 0/1 Running 0 115m gooddata-cn-afm-exec-api-7d46b6f777-vt99l 0/1 Running 0 115m gooddata-cn-cache-gc-28648500-zlkwv 0/1 Completed 0 91m gooddata-cn-cache-gc-28648560-2rkqp 0/1 Pending 0 31m gooddata-cn-calcique-5f566c5c7c-tkhqv 0/1 Running 0 115m gooddata-cn-calcique-5f566c5c7c-xkzp8 0/1 Running 0 115m gooddata-cn-dex-69d7b4d6cb-6wckz 0/1 CrashLoopBackOff 23 (4m57s ago) 101m gooddata-cn-dex-69d7b4d6cb-9t7lk 0/1 CrashLoopBackOff 23 (4m19s ago) 101m gooddata-cn-metadata-api-56766b5744-qbngc 0/1 CrashLoopBackOff 24 (4m8s ago) 115m gooddata-cn-metadata-api-56766b5744-xmffv 0/1 CrashLoopBackOff 24 (4m ago) 115m gooddata-cn-pulsar-cleanup-4xlf4 0/1 Error 0 97m gooddata-cn-pulsar-cleanup-s9wz8 0/1 Error 0 106m Let me first ask 4 questions to resolve what I think is the important ones. 1. For "gooddata-cn/gooddata-cn-metadata-api-56766b5744-qbngc:check-postgres-db", I see the log below. I am not sure if the connection is good or not? It seems to be "accepting connections", but "Stream closed EOF". (Same thing to gooddata-cn/gooddata-cn-dex-69d7b4d6cb-9t7lk:check-postgres-db) test-md-aurora-mdinstance1-lyu8ap647zmh.c23ofcaev62v.us-east-1.rds.amazonaws.com:5432 - accepting connections Stream closed EOF for gooddata-cn/gooddata-cn-metadata-api-56766b5744-qbngc (check-postgres-db) 2. For "gooddata-cn/gooddata-cn-metadata-api-56766b5744-qbngc:metadata-api", the last few lines of the log are listed below. Is that due to the question above, or new issue? org.springframework.boot.loader.Launcher.launch(Launcher.java:58)\n\tat org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88)\nCaused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'aeadEncryptionService' defined in file [/app/BOOT-INF/classes/com/gooddata/tiger/metadata/service/AeadEncryptionService.class]: Bean instantiation via constructor failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.gooddata.tiger.metadata.service.AeadEncryptionService]: Constructor threw exception; nested exception is java.io.EOFException: End of input at line 1 column 1 path $\n\tat org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:310)\n\tat ...... \t... 27 more\nCaused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.gooddata.tiger.metadata.service.AeadEncryptionService]: Constructor threw exception; nested exception is java.io.EOFException: End of input at line 1 column 1 path $\n\tat org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:226)\n\tat org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:117)\n\tat org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:306)\n\t... 57 more\nCaused by: java.io.EOFException: End of input at line 1 column 1 path $\n\tat com.google.gson.stream.JsonReader.nextNonWhitespace(JsonReader.java:1457)\n\tat com.google.gson.stream.JsonReader.doPeek(JsonReader.java:558)\n\tat com.google.gson.stream.JsonReader.peek(JsonReader.java:433)\n\tat com.google.crypto.tink.internal.JsonParser$JsonElementTypeAdapter.read(JsonParser.java:188)\n\tat com.google.crypto.tink.internal.JsonParser.parse(JsonParser.java:262)\n\tat com.google.crypto.tink.JsonKeysetReader.read(JsonKeysetReader.java:168)\n\tat com.google.crypto.tink.CleartextKeysetHandle.read(CleartextKeysetHandle.java:62)\n\tat com.gooddata.tiger.metadata.service.AeadEncryptionService.<init>(EncryptionService.kt:41)\n\tat java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)\n\tat java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)\n\tat java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)\n\tat kotlin.reflect.jvm.internal.calls.CallerImpl$Constructor.call(CallerImpl.kt:41)\n\tat kotlin.reflect.jvm.internal.KCallableImpl.callDefaultMethod$kotlin_reflection(KCallableImpl.kt:207)\n\tat kotlin.reflect.jvm.internal.KCallableImpl.callBy(KCallableImpl.kt:112)\n\tat org.springframework.beans.BeanUtils$KotlinDelegate.instantiateClass(BeanUtils.java:903)\n\tat org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:198)\n\t... 59 more\n"} Stream closed EOF for gooddata-cn/gooddata-cn-metadata-api-56766b5744-qbngc (metadata-api) 3. For "gooddata-cn/gooddata-cn-dex-69d7b4d6cb-9t7lk:dex", the log below. Is it still more of a connection issue or it actually connected but cannot create new table? {"time":"2024-06-20T205747.74770439Z","level":"INFO","msg":"Version info","dex_version":"v2.40.0","go":{"version":"go1.22.3","os":"linux","arch":"amd64"}} {"time":"2024-06-20T205747.747800734Z","level":"INFO","msg":"config issuer","issuer":"https://localhost/dex"} failed to initialize storage: failed to perform migrations: creating migration table: pq: no pg_hba.conf entry for host "10.20.0.143", user "postgres", database "dex", no encryption Stream closed EOF for gooddata-cn/gooddata-cn-dex-69d7b4d6cb-9t7lk (dex) 4. For "gooddata-cn/gooddata-cn-calcique-5f566c5c7c-xkzp8:calcique", I saw the following logs repeating every 10 secs. It seems it is "Unable to connect to [clustercfg.test-redis.uoet5d.use1.cache.amazonaws.com:6379]". But I created an Ubuntu pod in the same EKS, installed "redis-cli" and can connect to the redis using "redis-cli -c -h clustercfg.test-redis.uoet5d.use1.cache.amazonaws.com -p 6379 --tls -a password", and run a couple of "set" and "get" commands. So I am not sure why "calcique" is having trouble connecting it. How to figure it out? {"ts":"2024-06-20 210340.556","level":"INFO","logger":"org.zalando.logbook.Logbook","thread":"reactor-http-io_uring-2","traceId":"79dc36876d37b8dc","spanId":"79dc36876d37b8dc","msg":"HTTP response","accept":"*/*","action":"httpResponse","correlationId":"b85e03c3e2020d3a","durationMs":"2","method":"GET","remote":"/169.254.175.250:33974","state":"200","uri":"http://10.20.0.164:9012/actuator/health/liveness","user-agent":"kube-probe/1.29+"} {"ts":"2024-06-20 210342.269","level":"INFO","logger":"org.zalando.logbook.Logbook","thread":"reactor-http-io_uring-1","msg":"HTTP response","accept":"*/*","action":"httpResponse","correlationId":"9915dbc9f650c08d","durationMs":"9999","method":"GET","remote":"/169.254.175.250:33964","state":"200","uri":"http://10.20.0.164:9012/actuator/health/readiness","user-agent":"kube-probe/1.29+"} {"ts":"2024-06-20 210342.269","level":"WARN","logger":"org.springframework.boot.actuate.redis.RedisReactiveHealthIndicator","thread":"boundedElastic-20","traceId":"d760544a017e50f7","spanId":"d760544a017e50f7","msg":"Redis health check failed","exc":"org.springframework.data.redis.RedisConnectionFailureException: Unable to connect to Redis; nested exception is org.springframework.data.redis.connection.PoolException: Could not get a resource from the pool; nested exception is io.lettuce.core.RedisCommandInterruptedException: Command interrupted\n\tat ... 31 more\n"} {"ts":"2024-06-20 210342.274","level":"ERROR","logger":"com.gooddata.tiger.grpc.healthcheck.GrpcHealthCheck","thread":"boundedElastic-26","traceId":"c733ed64e8abdca0","spanId":"c733ed64e8abdca0","msg":"Error during GRPC Healthcheck call","action":"grpcHealthCheck","service":"tiger.MetadataStoreService","exc":"io.grpc.StatusException: UNAVAILABLE: Unable to resolve host gooddata-cn-metadata-api-headless\n\tat io.grpc.Status.asException(Status.java:552)\n\tat io.grpc.kotlin.ClientCalls$rpcImpl$1$1$1.onClose(ClientCalls.kt:296)\n\tat brave.grpc.TracingClientInterceptor$TracingClientCallListener.onClose(TracingClientInterceptor.java:202)\n\tat n"} {"ts":"2024-06-20 210342.312","level":"WARN","logger":"io.lettuce.core.cluster.topology.DefaultClusterTopologyRefresh","thread":"lettuce-io_uringEventLoop-20-2","msg":"Unable to connect to [clustercfg.test-redis.uoet5d.use1.cache.amazonaws.com:6379]: Connection initialization timed out. Command timed out after 20 second(s)","exc":"io.lettuce.core.RedisCommandTimeoutException: Connection initialization timed out. Command timed out after 20 second(s)\n\tat io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:71)\n\tat io.lettuce.core.protocol.RedisHandshakeHandler.lambda$channelRegistered$0(RedisHandshakeHandler.java:62)\n\t Thank you in advance.

Dongfeng Lu

06/20/2024, 11:15 PM

Let me add some additional information related to external Redis and Postgres, I followed https://www.gooddata.com/docs/cloud-native/3.7/deploy-and-install/cloud-native/helm-chart-installation/ and placed the following in custom-values.yaml when installing GDCN. My 3 questions are in the following YAML snippet enclosed by << >> service: redis: hosts: - << redis configuration endpoint? >> port: 6379 clusterMode: << true or false? I followed https://www.gooddata.com/docs/cloud-native/3.7/deploy-and-install/cloud-native/environment/aws/ and it created Cluster mode Enabled cluster. But examples in https://www.gooddata.com/docs/cloud-native/3.7/deploy-and-install/cloud-native/helm-chart-installation/ set it to false. Which one is right? I set it to true, but maybe that is the cause of the problem? >> # If your Redis service has authentication enabled, uncomment and declare the password password: password postgres: host: << RDS database write endpoint? Do we ever use reader endpoint? >> port: 5432 username: postgres #If you use Azure Database for PostgreSQL, the username must contain the hostname e.g. postgres@gooddata-cn-pg. password: password deployRedisHA: false deployPostgresHA: false Thanks.

Robert Moucha

06/21/2024, 12:26 PM

Hi, that's a lot of questions 🙂 Let's sort it out. 1. 3 deployments require connections to database (dex, metadata-api, sql-executor). Each deployment has its own initContainer, responsible for creating database (if doesn't exist) and roles (if they don't exist). The message with "accepting connections" simply indicates the database is accessible on network level (which is good). 2.

Error creating bean with name 'aeadEncryptionService'

indicates you have metadata encryption enabled (that's default), but you didn't set "keyset" used for encryption. Please follow this article explaining how to create and pass encyption keyset to the deployment. If you don't need to encrypt credentials in database, just set

metadataApi.encryptor.enabled=false

pq: no pg_hba.conf entry for host "10.20.0.143", user "postgres", database "dex"

during dex container startup. Please check your RDS if the database

dex

exists. If yes, review your RDS configuration whether you are able to access this database remotely. 4. calcique pod errors. As you were able to connect to redis with

-c

argument of

redis-cli

, this indicates your elasticache runs in cluster mode. Cluster mode has slightly different protocol and it needs to be explicitly turned on in client (

clusterMode

). But you also have SSL mode turned on (as your redis-cli uses

--tls

). In that case, your service.redis section of helm values should look like this:

Copy code

service:
  redis:
    hosts:
      -  <http://clustercfg.test-redis.uoet5d.use1.cache.amazonaws.com|clustercfg.test-redis.uoet5d.use1.cache.amazonaws.com>
    port: 6379
    useSSL: true
    clusterMode: true
    password: password

For RDS, use only write endpoint. Applications are unable to use the RDS read endpoint at all (sorry). The gooddata-cn helm chart creates a single-pod deployment called "tools". There's no service running in it, but it has useful binaries preinstalled, including

psql

and

redis-cli

. You can use it for connection issues diagnostics.

Dongfeng Lu

06/22/2024, 4:32 AM

Hi Robert, I do have a lot of questions, but you have many answers. Following your suggestions, I made some adjustment to values.yaml and reinstalled GDCN. Eventually most pods are running OK, and only DEX did not work. As the following shows, DEX pod restarted 14 times in 52 mins. $ kubectl get pod -n gooddata-cn | grep 0/ gooddata-cn-cache-gc-28650300-tpkls 0/1 Completed 0 163m gooddata-cn-cache-gc-28650360-mghvw 0/1 Completed 0 103m gooddata-cn-cache-gc-28650420-qflvp 0/1 Completed 0 43m gooddata-cn-dex-69d7b4d6cb-l9v56 0/1 CrashLoopBackOff 14 (4m13s ago) 52m gooddata-cn-dex-69d7b4d6cb-tvsxn 0/1 CrashLoopBackOff 14 (4m32s ago) 52m As you said, "gooddata-cn/gooddata-cn-dex-69d7b4d6cb-l9v56:check-postgres-db" still shows "accepting connections" test-md-aurora-mdinstance1-lyu8ap647zmh.c23ofcaev62v.us-east-1.rds.amazonaws.com:5432 - accepting connections │ Stream closed EOF for gooddata-cn/gooddata-cn-dex-69d7b4d6cb-l9v56 (check-postgres-db) But from gooddata-cn/gooddata-cn-dex-69d7b4d6cb-l9v56:dex, the log says: {"time":"2024-06-22T025802.099748574Z","level":"INFO","msg":"Version info","dex_version":"v2.40.0","go":{"version":"go1.22.3","os":"linux","arch":"amd64"}} {"time":"2024-06-22T025802.099835574Z","level":"INFO","msg":"config issuer","issuer":"https://localhost/dex"} failed to initialize storage: failed to perform migrations: creating migration table: pq: no pg_hba.conf entry for host "10.20.0.204", user "postgres", database "dex", no encryption Stream closed EOF for gooddata-cn/gooddata-cn-dex-69d7b4d6cb-l9v56 (dex) I double checked the other pod, it actually pointed to a different IP: {"time":"2024-06-22T040943.034055067Z","level":"INFO","msg":"Version info","dex_version":"v2.40.0","go":{"version":"go1.22.3","os":"linux","arch":"amd64"}} {"time":"2024-06-22T040943.034132047Z","level":"INFO","msg":"config issuer","issuer":"https://localhost/dex"} failed to initialize storage: failed to perform migrations: creating migration table: pq: no pg_hba.conf entry for host "10.20.0.138", user "postgres", database "dex", no encryption Stream closed EOF for gooddata-cn/gooddata-cn-dex-69d7b4d6cb-tvsxn (dex) And the IP address 10.20.0.138 or 10.20.0.204 are actually IPs of the nodes that hosting the pods. Why are they calling itself instead of external RDS Postgres? But they actually create a DEX database in RDS postgres. I connected to my testing pod $ kubectl exec -it shared-app-ubuntu -n pv-cases -- /bin/bash Connect to writer endpoint root@shared-app-ubuntu:/# psql --host=test-md-aurora-mdinstance1-lyu8ap647zmh.c23ofcaev62v.us-east-1.rds.amazonaws.com --port=5432 --username=postgres --password --dbname=md Password: psql (16.3 (Ubuntu 16.3-0ubuntu0.24.04.1), server 14.4) SSL connection (protocol: TLSv1.2, cipher: AES128-SHA256, compression: off) Type "help" for help. md=> \l List of databases Name | Owner | Encoding | Locale Provider | Collate | Ctype | ICU Locale | ICU Rules | Access privileges -----------+----------+----------+-----------------+-------------+-------------+------------+-----------+----------------------- dex | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | execution | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | md | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | postgres | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | rdsadmin | rdsadmin | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | rdsadmin=CTc/rdsadmin template0 | rdsadmin | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | =c/rdsadmin + | | | | | | | | rdsadmin=CTc/rdsadmin template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | =c/postgres + | | | | | | | | postgres=CTc/postgres (7 rows) I can access dex, and there are no tables md=> \c dex Password: psql (16.3 (Ubuntu 16.3-0ubuntu0.24.04.1), server 14.4) SSL connection (protocol: TLSv1.2, cipher: AES128-SHA256, compression: off) You are now connected to database "dex" as user "postgres". dex=> \d Did not find any relations. I decided to drop DEX DB dex=> \c md Password: psql (16.3 (Ubuntu 16.3-0ubuntu0.24.04.1), server 14.4) SSL connection (protocol: TLSv1.2, cipher: AES128-SHA256, compression: off) You are now connected to database "md" as user "postgres". md=> drop database dex; DROP DATABASE md=> \l List of databases Name | Owner | Encoding | Locale Provider | Collate | Ctype | ICU Locale | ICU Rules | Access privileges -----------+----------+----------+-----------------+-------------+-------------+------------+-----------+----------------------- execution | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | md | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | postgres | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | rdsadmin | rdsadmin | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | rdsadmin=CTc/rdsadmin template0 | rdsadmin | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | =c/rdsadmin + | | | | | | | | rdsadmin=CTc/rdsadmin template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | =c/postgres + | | | | | | | | postgres=CTc/postgres (6 rows) I then delete both DEX pods, so new pods are created, which actually recreated "dex" DB. md=> \l List of databases Name | Owner | Encoding | Locale Provider | Collate | Ctype | ICU Locale | ICU Rules | Access privileges -----------+----------+----------+-----------------+-------------+-------------+------------+-----------+----------------------- dex | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | execution | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | md | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | postgres | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | rdsadmin | rdsadmin | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | rdsadmin=CTc/rdsadmin template0 | rdsadmin | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | =c/rdsadmin + | | | | | | | | rdsadmin=CTc/rdsadmin template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | =c/postgres + | | | | | | | | postgres=CTc/postgres (7 rows) Still no tables in DEX DB md=> \c dex Password: psql (16.3 (Ubuntu 16.3-0ubuntu0.24.04.1), server 14.4) SSL connection (protocol: TLSv1.2, cipher: AES128-SHA256, compression: off) You are now connected to database "dex" as user "postgres". dex=> \d Did not find any relations. Just wanted to check the other DB generated by GDCN, dex=> \c execution Password: psql (16.3 (Ubuntu 16.3-0ubuntu0.24.04.1), server 14.4) SSL connection (protocol: TLSv1.2, cipher: AES128-SHA256, compression: off) You are now connected to database "execution" as user "postgres". execution=> \d List of relations Schema | Name | Type | Owner --------+-----------------------+-------+---------- public | databasechangelog | table | postgres public | databasechangeloglock | table | postgres public | qt_cache_md | table | postgres (3 rows) execution=> select count(*) from databasechangelog; count ------- 4 (1 row) execution=> quit root@shared-app-ubuntu:/# My test proved that DEX pods did create DEX DB in RDS Postgres. But why are they calling itself instead of external RDS Postgres to create migration table? failed to initialize storage: failed to perform migrations: creating migration table: pq: no pg_hba.conf entry for host "10.20.0.204", user "postgres", database "dex", no encryption I assume it would be in the values.yaml, but I believe you can provide quicker answers. Please provide instructions. As always, we appreciate your help. Dongfeng

Robert Moucha

06/23/2024, 5:27 PM

Hi Dongfeng, you probably misunderstood what the error message means:

Copy code

failed to initialize storage: failed to perform migrations: creating migration table: pq: no pg_hba.conf entry for host "10.20.0.204", user "postgres", database "dex", no encryption

The first part (

failed to initialize storage: failed to perform migrations: creating migration table

) says what Dex tried to do (to initialize db schema, create tables, etc.).

pq

is the name of Golang driver used to connect to database (Postgresql) and the last part (

no pg_hba.conf entry for host "10.20.0.204", user "postgres", database "dex", no encryption

) is exact error returned by Postgresql server (RDS/Aurora). The IP address is what DB sees in connection attempt. So it is actually client IP address from DB server's perspective. It's perfectly fine you can see k8s worker node IP and it's also fine that these IP addresses differ in both dex pods, because each pod will likely run on a different node. Your manual connection test using psql exposed the most probable cause of the problem: • successful connection:

SSL connection (protocol: TLSv1.2, cipher: AES128-SHA256, compression: off)

• dex error message:

no pg_hba.conf entry for host "10.20.0.204", user "postgres", database "dex", *no encryption*

So dex pq driver uses plaintext (non-ssl) connection that is probably disabled on RDS. Fortunately, our helm chart allows setting SSL mode for connection to dex db. Please adjust custom values with the following variable:

Copy code

dex:
  config:
    database:
      sslMode: require

Reapply gooddata-cn chart using

helm upgrade ...

command and check the results. Please, let me know if it helps.

Dongfeng Lu

06/25/2024, 12:57 AM

Hi Robert, Thank you very much for the response. I would never have thought that message was an error "returned by Postgresql server (RDS/Aurora)". Anyway, with your suggestion, all pods are running correctly and ready. Then I moved on to create an organization, to create a bootstrap token, and to create a user. After that I mapped the user to the organization. Everything looks good, and I got proper responses. I was ready to log in. The address https://k8s-ingressn-ingressn-39bcf5c35b-86ef3f7f9ae5f493.elb.us-east-1.amazonaws.com/, which is the DNS name for the load balancer, redirected to DEX where I can enter username and password. Once I clicked "login" button, I first got a "Bad Gateway" displayed on the screen, with the following URL in address bar. https://k8s-ingressn-ingressn-39bcf5c35b-86ef3f7f9ae5f493.elb.us-east-1.amazonaws.com/login/oauth2/code/k8s-ingressn-ingressn-39bcf5c35b-86ef3f7f9ae5f493.elb.us-east-1.amazonaws.com?code=cxoakhxm6shb3t36e5o43apw3&state=JpzPebIxMzUHN-ejyXJR3lOmFxV8Q3lp0dvstkcZsgNG The DEX log says "login successful", {"time":"2024-06-25T004543.858192574Z","level":"INFO","msg":"login successful","connector_id":"local","username":"Boss LV","preferred_username":"","email":"boss@livevox.com","groups":null} The gooddata-cn-auth-service log shows: {"ts":"2024-06-25 004543.955","level":"INFO","logger":"com.gooddata.oauth2.server.CustomDelegatingReactiveAuthenticationManager","thread":"DefaultDispatcher-worker-2","traceId":"24367a4d95654400","spanId":"24367a4d95654400","msg":"User attempts to authenticate","action":"login","orgId":"my-org","state":"started"} {"ts":"2024-06-25 004544.098","level":"INFO","logger":"com.gooddata.oauth2.server.JitProvisioningAuthenticationSuccessHandler","thread":"DefaultDispatcher-worker-4","traceId":"24367a4d95654400","spanId":"24367a4d95654400","msg":"JIT provisioning disabled, skipping","action":"JIT","orgId":"","state":"finished"} {"ts":"2024-06-25 004544.120","level":"INFO","logger":"com.gooddata.oauth2.server.LoggingRedirectServerAuthenticationSuccessHandler","thread":"DefaultDispatcher-worker-2","traceId":"24367a4d95654400","spanId":"24367a4d95654400","msg":"User Authenticated","action":"login","authenticationId":"CiRlNjYyYzljYS0yYWY4LTQxZDktOGNjMy00Mzk4ZTIzNmRmNzMSBWxvY2Fs","authenticationMethod":"OIDC","orgId":"my-org","state":"finished","userId":"boss.lv"} {"ts":"2024-06-25 004544.123","level":"INFO","logger":"org.zalando.logbook.Logbook","thread":"DefaultDispatcher-worker-2","traceId":"24367a4d95654400","spanId":"24367a4d95654400","msg":"HTTP response","accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7","action":"httpResponse","correlationId":"c3dc27fd83b20ab7","durationMs":"192","method":"GET","remote":"10.20.0.120:41274","state":"302","uri":"https://k8s-ingressn-ingressn-39bcf5c35b-86ef3f7f9ae5f493.elb.us-east-1.amazonaws.com/login/oauth2/code/k8s-ingressn-ingressn-39bcf5c35b-86ef3f7f9ae5f493.elb.us-east-1.amazonaws.com?code=cxoakhxm6shb3t36e5o43apw3&state=JpzPebIxMzUHN-ejyXJR3lOmFxV8Q3lp0dvstkcZsgNG","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36 Edg/126.0.0.0"} The URL in the log seems to match that in browser's address bar. So what is going on? I am not sure where to debug. Please Help. Also, you mentioned in your first response that "The gooddata-cn helm chart creates a single-pod deployment called "tools". You can use it for connection issues diagnostics." I was able to connection to the pod, but what exactly can I do in terms of GDCN? Is there any documentation, samples about its use? Thanks, Dongfeng

Robert Moucha

06/25/2024, 4:12 PM

Hi, I'm not sure where the Bad gateway comes from. Here's list of things you should check: 1. all pods are running 2. you should see 2 Ingresses in gooddata-cn namespace. One for dex (I expect you set up

dex.ingress.authHost

to LB hostname), and the second ingress should start with

managed-

prefix (e.g.

managed-my-org

) and its hostname is the same as you specified in the Organization resource (an LB hostname as well, probably). If you still have issues, please follow guidelines in my Gist https://gist.github.com/mouchar/c9e53714ef8cd95a08fdb5d234bf1898 to collect support-bundle package and DM it to me. I will review logs to see what's wrong.

Dongfeng Lu

06/25/2024, 10:24 PM

Hi Robert, I finally checked the log of nginx ingress controller, and it had an error message "upstream sent too big header while reading response header from upstream". So I increased the controller's buffer size and everything is working perfectly now. I can log in, create workspaces, etc. Thank you very much for all the help.

Robert Moucha

06/26/2024, 8:13 AM

Ah, I'm glad to hear that you managed to solve it. We recommend setting buffer size at least to 16k in our docs https://www.gooddata.com/docs/cloud-native/3.11/deploy-and-install/cloud-native/helm-chart-installation/#install-nginx-ingress-controller

15 Views

Open in Slack

Previous Next