Hi, We are facing one more issue with GoodData ver...
# gooddata-cn
n
Hi, We are facing one more issue with GoodData version 3.24.1. After upgrading the Kubernetes node, 2 out of 3 etcd pods are keep restarting. The log is attached below. Please let us know how to address this issue.
m
Hi Noushad, hopefully my colleague @Robert Moucha may have some insight and can help with this 🙂
👍 1
r
This may occasionally happen, older etcd chart versions are susceptible to this issue. Etcd is designed to keep data as long as the majority of pods (2 of 3, in your case) are running. It cannot survive loss of 2 or more pods. In case it already happened, please do the following: 1. make sure the env variable ETCD_INITIAL_CLUSTER_STATE is set to
new
in gooddata-cn-etcd StatefulSet. If it is set to
existing
, patch the StatefulSet and set it to
new
. 2. delete all 3 PVCs belonging to etcd:
kubectl -n YOUR-NAMESPACE delete pvc  -l <http://app.kubernetes.io/component=etcd|app.kubernetes.io/component=etcd> --wait=false
. The argument
--wait=true
is important. 3. delete all etcd pods at once:
kubectl -n YOUR-NAMESPACE delete pod -l <http://app.kubernetes.io/component=etcd|app.kubernetes.io/component=etcd>
Pods will be deleted, along with their PVCs, and then created, and resulting cluster will be operational. 4. patch etcd StatefulSet and set ETCD_INITIAL_CLUSTER_STATE to
existing
gratitude thank you 1
New gooddata-cn helm chart (since version 3.37.0) uses newer etcd chart which has better cluster membership handling and is more stable than the old one. Consider upgrading.
n
Than you @Robert Moucha 🙂 we will try it and update you
In step
2
are we supposed to
--wait=false
or
--wait=true
?
r
Sorry, wait=false is correct
👍 1