Pete Lorenz
01/19/2024, 3:54 PMUpdating member in existing cluster
2024-01-19T15:36:31.311396926Z {"level":"warn","ts":"2024-01-19T15:36:31.311232Z","logger":"etcd-client","caller":"v3@v3.5.10/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc000322c40/gooddata-cn-etcd-0.gooddata-cn-etcd-headless.gooddata-cn.svc.cluster.local:2379>","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: member not found"}
2024-01-19T15:36:31.311425474Z Error: etcdserver: member not found
The pod enters Crashloopbackoff state and repeatedly restarts. Any ideas on what we can do to resolve this? Is the third-party etcd deployment even necessary and, if not, can we safely remove it? Thanks for suggestions.Jan Soubusta
01/22/2024, 12:21 PMBROKEN_POD_MEMBER_ID=`kubectl exec -it -n quiver etcd-0 -- etcdctl member list | grep etcd-1 | cut -d, -f 1`
echo $BROKEN_POD_MEMBER_ID
365963a78ee27498
kubectl exec -it -n quiver etcd-0 -- etcdctl member remove $BROKEN_POD_MEMBER_ID
Delete affected ETCD’s pod persistent volume claim (asynchronously) to let new pod initialize fresh configuration:
kubectl delete -n quiver pvc data-etcd-1 --wait=false
Trigger ETCD statefulset rolling update:
kubectl describe sts -n quiver etcd | grep CLUSTER_STATE
# !!! WARNING: Result STATE must be "existing". If not, please, ping us in this thread !!!
# If state == "existing", then:
kubectl rollout restart -n quiver sts etcd
Now all the ETCD pods should be healthy again.
If not, please, ping us in this thread!Robert Moucha
01/22/2024, 1:42 PM-n namespace
and also etcd pod names may be differentPete Lorenz
01/23/2024, 8:29 PMetcdctl member list
568dabbac362697e, started, gooddata-cn-etcd-2, <http://gooddata-cn-etcd-2.gooddata-cn-etcd-headless.gooddata-cn.svc.cluster.local:2380>, <http://gooddata-cn-etcd-2.gooddata-cn-etcd-headless.gooddata-cn.svc.cluster.local:2379>,<http://gooddata-cn-etcd.gooddata-cn.svc.cluster.local:2379>, false
c4affa16810d4ac9, started, gooddata-cn-etcd-0, <http://gooddata-cn-etcd-0.gooddata-cn-etcd-headless.gooddata-cn.svc.cluster.local:2380>, <http://gooddata-cn-etcd-0.gooddata-cn-etcd-headless.gooddata-cn.svc.cluster.local:2379>,<http://gooddata-cn-etcd.gooddata-cn.svc.cluster.local:2379>, false
Second, as mentioned in the procedure, the CLUSTER_STATE should be "existing", but it appears to be "new" (or non-existent, since this is ETCD_INITIAL_CLUSTER_STATE, we don't seem to have a CLUSTER_STATE field).
Our biggest question is whether this etcd issue is simply a nuisance problem or if it may be causing the timeout we're seeing in analytical designer (described in a different thread), which is a real blocker for us, since we're currently unable to generate any insights for this organization.Pete Lorenz
01/23/2024, 8:32 PMPete Lorenz
01/23/2024, 8:37 PM