Pete Lorenz
10/30/2023, 3:55 PM"level":"ERROR","logger":"com.gooddata.tiger.grpc.healthcheck.GrpcHealthCheck","thread":"boundedElastic-2","traceId":"9e2839743de46645","spanId":"9e2839743de46645","msg":"Error during GRPC Healthcheck call","action":"grpcHealthCheck","exc":"io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host gooddata-cn-result-cache-headless
And warnings that the host is not resolvable:
"level":"WARN","logger":"io.grpc.internal.ManagedChannelImpl","thread":"grpc-default-executor-4","msg":"[Channel<3>: (gooddata-cn-result-cache-headless:6567)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host gooddata-cn-result-cache-headless, cause=java.lang.RuntimeException: java.net.UnknownHostException: gooddata-cn-result-cache-headless: Name or service not known\n\tat io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:223)
There are similar errors regarding gooddata-cn-metadata-api-headless:
"level":"ERROR","logger":"com.gooddata.tiger.grpc.healthcheck.GrpcHealthCheck","thread":"boundedElastic-1","traceId":"756f885e0c24d41d","spanId":"756f885e0c24d41d","msg":"Error during GRPC Healthcheck call","action":"grpcHealthCheck","exc":"io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host gooddata-cn-metadata-api-headless
The afm-exec-api pods have errors that they cannot resolve gooddata-cn-calcique-headless. Attaching the pod logs for the relevant services. I've tried restarting the pods but the issue remains. Any suggestions as to what might be the cause or how to resolve?Marley Bross
10/31/2023, 8:52 PMPete Lorenz
10/31/2023, 9:07 PMBoris
11/01/2023, 9:06 AMRobert Moucha
11/01/2023, 4:24 PMUnable to resolve host gooddata-cn-result-cache-headless
means that this service has no A
records (no IP addresses). It happens when none of pods belonging to the deployment gooddata-cn-result-cache
are Ready.
You sent log from one of result-cache pods (gooddata-cn-result-cache-7f6bdbcf6f-h5s4w_result-cache.log) but there are no errors visible. According to timestamps, the pod was recently restarted, so I assume it crashed some time ago. If both pods are repeatedly crashing, it would explain why headless service doesn't serve any addresses.
Please check the following:
1. How often are the pods restarting.
kubectl -n gooddata-cn get pod --selector app.kubernetes.io/component=resultCache
NAME READY STATUS RESTARTS AGE
gooddata-cn-result-cache-584b5d67b-287gw 1/1 Running 0 31h
gooddata-cn-result-cache-584b5d67b-w6t75 1/1 Running 0 31h
If the RESTARTS
column in non-zero, there should also be a time when the last restart occurred.
2. Why the pod restarted. The kubectl describe pod --selector <http://app.kubernetes.io/component=resultCache|app.kubernetes.io/component=resultCache>
will show details for both pods. If the pod crashed, you can see valuable details in Events section, and also in Containers/result-cache/Last State secion. For example:
Containers:
result-cache:
Container ID: <containerd://1dd40a3efc53369eea5a45cb30f5f14538f827b4a263fb3e6fc606e332192e5>b
Image: xxxx/sql-executor:VVVV
Image ID: xxxx/sql-executor@sha256:92901e08171c11f5a11f407a9d18a3b6a2c2703e52bb32bfcee6c9c26ea7ffcd
Ports: 6567/TCP, 9040/TCP, 9041/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Wed, 01 Nov 2023 01:38:39 +0100
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 01 Nov 2023 01:40:29 +0100
Finished: Wed, 01 Nov 2023 16:23:24 +0100
Ready: True
Restart Count: 1
Robert Moucha
11/01/2023, 4:26 PMOOMKilled
- meaning the JVM process ran out of memory reserved by Kubernetes.
But there may be other reasons as well.Robert Moucha
11/01/2023, 4:29 PM-p
to kubectl logs command, like:
kubectl -n gooddata-cn logs -p gooddata-cn-result-cache-7f6bdbcf6f-h5s4w
Robert Moucha
11/01/2023, 4:31 PMPete Lorenz
11/01/2023, 4:44 PM> kubectl -n gooddata-cn get pod --selector <http://app.kubernetes.io/component=resultCache|app.kubernetes.io/component=resultCache>
NAME READY STATUS RESTARTS AGE
gooddata-cn-result-cache-7f6bdbcf6f-9hphl 0/1 CrashLoopBackOff 13 (83s ago) 51m
gooddata-cn-result-cache-7f6bdbcf6f-bh9mg 0/1 CrashLoopBackOff 72 (3m54s ago) 6h17m
>
The CrashLoopBackOff is new however. We were not seeing this yesterday. The cause seems to be a startup probe failure. Attaching the result of "kubectl describe pod --selector app.kubernetes.io/component=resultCache -n gooddata-cn"Pete Lorenz
11/01/2023, 4:49 PMPete Lorenz
11/01/2023, 4:50 PMPete Lorenz
11/01/2023, 4:55 PMkubectl get endpoints -n gooddata-cn
NAME ENDPOINTS AGE
gooddata-cn-afm-exec-api <none> 54d
gooddata-cn-analytical-designer <none> 54d
gooddata-cn-api-gateway <none> 54d
gooddata-cn-api-gateway-headless <none> 54d
gooddata-cn-apidocs <none> 54d
gooddata-cn-auth-service <none> 54d
gooddata-cn-auth-service-headless <none> 54d
gooddata-cn-calcique-headless <none> 54d
gooddata-cn-dashboards <none> 54d
gooddata-cn-export-controller <none> 54d
gooddata-cn-home-ui <none> 54d
gooddata-cn-ldm-modeler <none> 54d
gooddata-cn-measure-editor <none> 54d
gooddata-cn-metadata-api <none> 54d
gooddata-cn-metadata-api-headless <none> 54d
gooddata-cn-result-cache-headless <none> 54d
gooddata-cn-scan-model <none> 54d
gooddata-cn-sql-executor-headless <none> 54d
gooddata-cn-tabular-exporter-headless <none> 54d
gooddata-cn-web-components <none> 54d
ingress-nginx-controller 10.163.145.73:443,10.163.145.73:80 54d
ingress-nginx-controller-admission 10.163.145.73:8443 54d
Perhaps this is a symptom of the other issues but it's concerning to us that even the services that are running without errors do not have endpoints.Robert Moucha
11/01/2023, 5:00 PMRobert Moucha
11/01/2023, 5:02 PMPete Lorenz
11/01/2023, 5:02 PMRobert Moucha
11/01/2023, 5:08 PM<http://app.kubernetes.io/component|app.kubernetes.io/component>
<http://app.kubernetes.io/instance|app.kubernetes.io/instance>
<http://app.kubernetes.io/name|app.kubernetes.io/name>
In pods, there must be all these three labels and must have the same value as the service expects. Otherwise, the service will not have any endpointPete Lorenz
11/01/2023, 5:10 PMkubectl describe deployment -n gooddata-cn gooddata-cn-afm-exec-api
Name: gooddata-cn-afm-exec-api
Namespace: gooddata-cn
CreationTimestamp: Thu, 07 Sep 2023 21:52:12 +0000
Labels: <http://app.kubernetes.io/component=afmExecApi|app.kubernetes.io/component=afmExecApi>
<http://app.kubernetes.io/instance=gooddata-cn|app.kubernetes.io/instance=gooddata-cn>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=gooddata-cn|app.kubernetes.io/name=gooddata-cn>
<http://app.kubernetes.io/owner=gooddata-cn|app.kubernetes.io/owner=gooddata-cn>
<http://app.kubernetes.io/version=2.5.1|app.kubernetes.io/version=2.5.1>
<http://helm.sh/chart=gooddata-cn-2.5.1|helm.sh/chart=gooddata-cn-2.5.1>
<http://objectset.rio.cattle.io/hash=6183899f57b94d51df0180e32d29a0b356e8a441|objectset.rio.cattle.io/hash=6183899f57b94d51df0180e32d29a0b356e8a441>
Robert Moucha
11/01/2023, 5:10 PM# service selector
kubectl get service -n gooddata-cn gooddata-cn-result-cache-headless -o jsonpath='{.spec.selector}'
# deployment (pod template)
kubectl get deployments.apps -n gooddata-cn gooddata-cn-result-cache -o jsonpath='{.spec.template.metadata.labels}'
Robert Moucha
11/01/2023, 5:11 PMPete Lorenz
11/01/2023, 5:11 PMkubectl get service -n gooddata-cn gooddata-cn-result-cache-headless -o jsonpath='{.spec.selector}'
{"<http://app.kubernetes.io/component|app.kubernetes.io/component>":"resultCache","<http://app.kubernetes.io/instance|app.kubernetes.io/instance>":"gooddata-cn","<http://app.kubernetes.io/name|app.kubernetes.io/name>":"gooddata-cn","<http://app.kubernetes.io/owner|app.kubernetes.io/owner>":"gooddata-cn"}>
Robert Moucha
11/01/2023, 5:11 PMPete Lorenz
11/01/2023, 5:12 PMkubectl get service -n gooddata-cn gooddata-cn-result-cache-headless -o jsonpath='{.spec.selector}'
{"<http://app.kubernetes.io/component|app.kubernetes.io/component>":"resultCache","<http://app.kubernetes.io/instance|app.kubernetes.io/instance>":"gooddata-cn","<http://app.kubernetes.io/name|app.kubernetes.io/name>":"gooddata-cn","<http://app.kubernetes.io/owner|app.kubernetes.io/owner>":"gooddata-cn"}>
Robert Moucha
11/01/2023, 5:12 PMkubectl get deployments.apps -n gooddata-cn gooddata-cn-result-cache -o jsonpath='{.spec.template.metadata.labels}'
Pete Lorenz
11/01/2023, 5:13 PM{{/*
Common labels
*/}}
{{- define "gooddata-cn.labels" -}}
<http://helm.sh/chart|helm.sh/chart>: {{ include "gooddata-cn.chart" . }}
<http://app.kubernetes.io/owner|app.kubernetes.io/owner>: {{ include "gooddata-cn.name" . }}
{{ include "gooddata-cn.selectorLabels" . }}
{{- if .Chart.AppVersion }}
<http://app.kubernetes.io/version|app.kubernetes.io/version>: {{ .Chart.AppVersion | quote }}
{{- end }}
<http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: {{ .Release.Service }}
{{- end -}}
Pete Lorenz
11/01/2023, 5:13 PMkubectl get deployments.apps -n gooddata-cn gooddata-cn-result-cache -o jsonpath='{.spec.template.metadata.labels}'
{"<http://app.kubernetes.io/component|app.kubernetes.io/component>":"resultCache","<http://app.kubernetes.io/instance|app.kubernetes.io/instance>":"gooddata-cn","<http://app.kubernetes.io/name|app.kubernetes.io/name>":"gooddata-cn"}>
Robert Moucha
11/01/2023, 5:14 PMRobert Moucha
11/01/2023, 5:18 PMPete Lorenz
11/01/2023, 5:20 PMRobert Moucha
11/01/2023, 5:21 PMPete Lorenz
11/01/2023, 5:22 PMRobert Moucha
11/01/2023, 5:24 PMPete Lorenz
11/01/2023, 5:24 PMRobert Moucha
11/01/2023, 5:25 PMPete Lorenz
11/01/2023, 5:25 PMPete Lorenz
11/02/2023, 2:28 PM