Tomáš Kačur
04/29/2024, 10:01 AMgooddata/gooddata-cn-ce:3.x
orchestrated in a k8s cluster exposing the gdcn api service (this way we are trying to simplify the deployment and resources consumption). It works if we start a fresh new k8s pod with a fresh new persistent volume (pvc provisioned as GKE GCE persistent disk), however it starts to fail after we restart the deployment and never reaches "running" state (keeps on restarting). I suspect the issue is in the state that is persited to the disk however I couldn't find where could be the problem. I can see some errors during start in the bookkeeper/pulsar:
2024-04-29T09:58:05,754+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] ERROR org.apache.bookkeeper.client.ReadLastConfirmedOp - While readLastConfirmed ledger: 31 did not hear success responses from all quorums, QuorumCoverage(e:1,w:1,a:1) = [-8]
2024-04-29T09:58:05,754+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] DEBUG org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [pulsar/standalone/localhost:8080/persistent/healthcheck] Opened ledger 31: Error while recovering ledger
2024-04-29T09:58:05,754+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [pulsar/standalone/localhost:8080/persistent/healthcheck] Failed to open ledger 31: Error while recovering ledger
2024-04-29T09:58:05,755+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [pulsar/standalone/localhost:8080/persistent/healthcheck] Failed to initialize managed ledger: Error while recovering ledger error code: -10
2024-04-29T09:58:05,755+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [pulsar/standalone/localhost:8080/persistent/healthcheck] Closing managed ledger
2024-04-29T09:58:05,755+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] WARN org.apache.pulsar.broker.service.BrokerService - Failed to create topic <persistent://pulsar/standalone/localhost:8080/healthcheck>
This scenario however works on older version such as 2.5 but now moving to version 3.x (e.g. 3.7.0) it doesn't work. I'm attaching the k8s deployment snippet and logs of the pod (with PULSAR_LOGLEVEL=DEBUG).
Can you please suggest how to debug it and would could we do to fix it? Maybe adjust the initialization script? We are ok with temporal service outage (few minutes).Tomáš Kačur
04/29/2024, 10:03 AMTomáš Kačur
04/29/2024, 10:04 AMPULSAR_LOGLEVEL=DEBUG
Tomáš Kačur
04/29/2024, 10:29 AMsuccessfull first deploy
with PULSAR_LOGLEVEL=WARNTomáš Kačur
04/29/2024, 10:31 AMfailing restarted deployment
- keeps on restarting and never reaches running stateRadek Novacek
04/29/2024, 2:19 PMTomáš Kačur
04/29/2024, 3:05 PMsecurityContext.fsGroupChangePolicy
to Always
with fsGroup
for pulsar pods (zookeeper and bookkeeper), so I did the same for "my docker k8s deployment" but it failed since there are more users that have different guids (e.g. for postgresql and redis..) so it even failed on permissions. It was just a blind/naive shot to fix it but I thought I would let you know..Robert Moucha
04/30/2024, 7:13 AMfsGroupChangePolicy
needs to be set to Always
only if you upgrade existing pulsar chart from 2.x to 3.x image version (the new images are running app as non-root user so persistent data had to change ownership).
For running CE image as k8s Pod, this setting should not be used - there's one volume that contains data belinging to multiple users (root, postgres, ...) so changing group ownership is not desired.Robert Moucha
04/30/2024, 7:20 AMTomáš Kačur
04/30/2024, 7:39 AMTomáš Kačur
04/30/2024, 7:46 AMfsGroupChangePolicy
- I just tried it, thought it would help here, however I see its not the way here so I don't specify it anymore.Tomáš Kačur
04/30/2024, 7:50 AMRobert Moucha
05/02/2024, 6:09 AMRobert Moucha
05/02/2024, 12:44 PM- name: PULSAR_STANDALONE_USE_ZOOKEEPER
value: '1'
Robert Moucha
05/02/2024, 12:58 PMTomáš Kačur
05/06/2024, 11:32 AMRobert Moucha
05/06/2024, 11:40 AMRobert Moucha
05/06/2024, 11:42 AM