Hi, I'm trying to install Kubernetes and GD on a s...
# gooddata-cn
d
Hi, I'm trying to install Kubernetes and GD on a single machine with a fresh installation of Ubuntu Server. I'm using Multipass to run the workers, but if there is a better alternative for running on a single machine, or a way to run GD with no additional servers, please let me know. Is there any beginner-friendly guide that can guide me through the whole installation process and configuration needed for both the OS and Kubernetes? I tried following some guides, but I'm constantly running into problems the guide doesn't cover... I need something that starts with a fresh OS installation, goes step-by-step, and actually works in the end. Thanks.
m
Hi Daniel, I will refer you to @Milan Sladký here. He is our deployment specialist.
m
Hi Daniel, we do not provide any guide how to setup OS and Kubernetes as it is fairly complicated and complex process. However, if you want to get Kubernetes running easily on a single machine you can go with https://k3d.io/. You just need the docker.
d
Hi, I managed to get k3d running, but now when I install GD.CN I get an error without enough to details to understand why it's failing.
Copy code
chylek@ubuntu:~$ helm install --version 1.4.0 --namespace gooddata-cn --wait --debug -f customized-values-gooddata-cn.yaml gooddata-cn gooddata/gooddata-cn
install.go:178: [debug] Original chart version: "1.4.0"
install.go:199: [debug] CHART PATH: /home/chylek/.cache/helm/repository/gooddata-cn-1.4.0.tgz

client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
install.go:165: [debug] Clearing discovery cache
wait.go:48: [debug] beginning wait for 2 resources with timeout of 1m0s
W1028 22:30:06.465847   19267 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
client.go:299: [debug] Starting delete for "gooddata-cn-create-namespace" Job
client.go:328: [debug] jobs.batch "gooddata-cn-create-namespace" not found
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job gooddata-cn-create-namespace with timeout of 5m0s
client.go:556: [debug] Add/Modify event for gooddata-cn-create-namespace: ADDED
client.go:595: [debug] gooddata-cn-create-namespace: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for gooddata-cn-create-namespace: MODIFIED
client.go:595: [debug] gooddata-cn-create-namespace: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition
helm.go:88: [debug] failed pre-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
	<http://helm.sh/helm/v3/cmd/helm/install.go:127|helm.sh/helm/v3/cmd/helm/install.go:127>
<http://github.com/spf13/cobra.(*Command).execute|github.com/spf13/cobra.(*Command).execute>
	<http://github.com/spf13/cobra@v1.2.1/command.go:856|github.com/spf13/cobra@v1.2.1/command.go:856>
<http://github.com/spf13/cobra.(*Command).ExecuteC|github.com/spf13/cobra.(*Command).ExecuteC>
	<http://github.com/spf13/cobra@v1.2.1/command.go:974|github.com/spf13/cobra@v1.2.1/command.go:974>
<http://github.com/spf13/cobra.(*Command).Execute|github.com/spf13/cobra.(*Command).Execute>
	<http://github.com/spf13/cobra@v1.2.1/command.go:902|github.com/spf13/cobra@v1.2.1/command.go:902>
main.main
	<http://helm.sh/helm/v3/cmd/helm/helm.go:87|helm.sh/helm/v3/cmd/helm/helm.go:87>
runtime.main
	runtime/proc.go:225
runtime.goexit
	runtime/asm_amd64.s:1371
chylek@ubuntu:~$ kubectl get nodes
NAME                    STATUS   ROLES                  AGE   VERSION
k3d-gooddata-agent-0    Ready    <none>                 59m   v1.21.5+k3s2
k3d-gooddata-agent-2    Ready    <none>                 59m   v1.21.5+k3s2
k3d-gooddata-agent-1    Ready    <none>                 59m   v1.21.5+k3s2
k3d-gooddata-server-0   Ready    control-plane,master   59m   v1.21.5+k3s2
It's very likely I missed some part of the installation process, but even with --debug it doesn't say what it's waiting for, so I don't know what's missing.
a
@Milan Sladký Could you please consult with Daniel?
m
d
I installed pulsar with these values, with storageclass set according to:
Copy code
~$ kubectl get storageclass
NAME                   PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path (default)   <http://rancher.io/local-path|rancher.io/local-path>   Delete          WaitForFirstConsumer   false                  4d10h
I tried reinstalling it just in case:
Copy code
chylek@ubuntu:~$ helm upgrade --install --namespace pulsar --version 2.7.2     -f customized-values-pulsar.yaml --set initialize=true     pulsar apache/pulsar
W1102 08:22:35.920845   21249 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1102 08:22:35.922540   21249 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1102 08:22:35.928066   21249 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1102 08:22:35.929502   21249 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1102 08:22:35.931054   21249 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1102 08:22:35.933140   21249 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1102 08:22:35.934550   21249 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1102 08:22:35.935975   21249 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1102 08:22:35.938279   21249 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1102 08:22:35.961397   21249 warnings.go:70] <http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1> ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1> ClusterRole
W1102 08:22:35.962992   21249 warnings.go:70] <http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1> ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1> ClusterRole
W1102 08:22:35.964974   21249 warnings.go:70] <http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1> ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1> ClusterRole
Release "pulsar" has been upgraded. Happy Helming!
NAME: pulsar
LAST DEPLOYED: Tue Nov  2 08:22:35 2021
NAMESPACE: pulsar
STATUS: deployed
REVISION: 2
TEST SUITE: None
If I revert to a snapshot before installing GD, helm says pulsar was deployed:
Copy code
chylek@ubuntu:~$ helm list --all-namespaces
NAME         	NAMESPACE    	REVISION	UPDATED                                	STATUS  	CHART              	APP VERSION
ingress-nginx	ingress-nginx	1       	2021-10-28 22:24:57.604897327 +0000 UTC	deployed	ingress-nginx-4.0.6	1.0.4      
pulsar       	pulsar       	1       	2021-10-28 22:27:47.400520567 +0000 UTC	deployed	pulsar-2.7.2       	2.7.2      
traefik      	kube-system  	1       	2021-10-28 21:44:02.503732749 +0000 UTC	deployed	traefik-9.18.2     	2.4.8      
traefik-crd  	kube-system  	1       	2021-10-28 21:44:01.540122164 +0000 UTC	deployed	traefik-crd-9.18.2
m
ok, can you plese send here output of
kubectl get pods -A
?
d
Copy code
chylek@ubuntu:~$ kubectl get pods -A
NAMESPACE       NAME                                        READY   STATUS      RESTARTS   AGE
kube-system     helm-install-traefik-crd-6m762              0/1     Completed   0          4d14h
kube-system     helm-install-traefik-ltdsg                  0/1     Completed   1          4d14h
ingress-nginx   svclb-ingress-nginx-controller-99jmv        0/2     Pending     0          4d14h
ingress-nginx   svclb-ingress-nginx-controller-pphnj        0/2     Pending     0          4d14h
ingress-nginx   svclb-ingress-nginx-controller-vdr6q        0/2     Pending     0          4d14h
ingress-nginx   svclb-ingress-nginx-controller-r9lmt        0/2     Pending     0          4d14h
pulsar          pulsar-bookie-init-hkjm4                    0/1     Completed   0          4d14h
pulsar          pulsar-pulsar-init-sqmwt                    0/1     Completed   0          4d14h
kube-system     svclb-traefik-vqd9b                         2/2     Running     4          4d14h
kube-system     local-path-provisioner-5ff76fc89d-grqwm     1/1     Running     6          4d14h
kube-system     svclb-traefik-vddkf                         2/2     Running     4          4d14h
kube-system     svclb-traefik-f6bm7                         2/2     Running     4          4d14h
kube-system     metrics-server-86cbb8457f-bsgc2             1/1     Running     2          4d14h
kube-system     coredns-7448499f4d-6p875                    1/1     Running     2          4d14h
kube-system     svclb-traefik-9jl5x                         2/2     Running     4          4d14h
ingress-nginx   ingress-nginx-controller-5c8d66c76d-zdld5   1/1     Running     1          4d14h
pulsar          pulsar-zookeeper-2                          1/1     Running     1          3h37m
kube-system     traefik-97b44b794-rjwl6                     1/1     Running     2          4d14h
pulsar          pulsar-recovery-0                           1/1     Running     1          4d14h
ingress-nginx   ingress-nginx-controller-5c8d66c76d-4qckr   1/1     Running     1          4d14h
pulsar          pulsar-zookeeper-1                          1/1     Running     1          3h37m
pulsar          pulsar-zookeeper-0                          1/1     Running     1          4d14h
pulsar          pulsar-bookie-1                             1/1     Running     1          4d14h
pulsar          pulsar-bookie-0                             1/1     Running     1          4d14h
pulsar          pulsar-broker-0                             1/1     Running     1          4d14h
pulsar          pulsar-broker-1                             1/1     Running     1          4d14h
pulsar          pulsar-bookie-2                             1/1     Running     1          4d14h
This is again from the snapshot before installing GD, I will try installing it again and post if anything has changed.
Latest installation attempt:
Copy code
chylek@ubuntu:~$ helm install --version 1.4.0 --namespace gooddata-cn --wait \
>  --debug  -f customized-values-gooddata-cn.yaml gooddata-cn gooddata/gooddata-cn
install.go:178: [debug] Original chart version: "1.4.0"
install.go:199: [debug] CHART PATH: /home/chylek/.cache/helm/repository/gooddata-cn-1.4.0.tgz

client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
install.go:165: [debug] Clearing discovery cache
wait.go:48: [debug] beginning wait for 2 resources with timeout of 1m0s
W1102 12:34:33.481433   19089 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
client.go:299: [debug] Starting delete for "gooddata-cn-create-namespace" Job
client.go:328: [debug] jobs.batch "gooddata-cn-create-namespace" not found
client.go:128: [debug] creating 1 resource(s)
client.go:528: [debug] Watching for changes to Job gooddata-cn-create-namespace with timeout of 5m0s
client.go:556: [debug] Add/Modify event for gooddata-cn-create-namespace: ADDED
client.go:595: [debug] gooddata-cn-create-namespace: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for gooddata-cn-create-namespace: MODIFIED
client.go:595: [debug] gooddata-cn-create-namespace: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:556: [debug] Add/Modify event for gooddata-cn-create-namespace: MODIFIED
client.go:299: [debug] Starting delete for "gooddata-cn-create-namespace" Job
client.go:128: [debug] creating 62 resource(s)
W1102 12:34:43.123427   19089 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
wait.go:48: [debug] beginning wait for 62 resources with timeout of 5m0s
ready.go:277: [debug] Deployment is not ready: gooddata-cn/gooddata-cn-db-pgpool. 0 out of 2 expected pods are ready

(the last line repeated many times)

I1102 12:38:13.249341   19089 request.go:665] Waited for 10.126477116s due to client-side throttling, not priority and fairness, request: GET:<https://0.0.0.0:33355/api/v1/namespaces/gooddata-cn/services/gooddata-cn-metadata-api>
ready.go:277: [debug] Deployment is not ready: gooddata-cn/gooddata-cn-db-pgpool. 0 out of 2 expected pods are ready

(more repeats)

Error: INSTALLATION FAILED: timed out waiting for the condition
helm.go:88: [debug] timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
	<http://helm.sh/helm/v3/cmd/helm/install.go:127|helm.sh/helm/v3/cmd/helm/install.go:127>
<http://github.com/spf13/cobra.(*Command).execute|github.com/spf13/cobra.(*Command).execute>
	<http://github.com/spf13/cobra@v1.2.1/command.go:856|github.com/spf13/cobra@v1.2.1/command.go:856>
<http://github.com/spf13/cobra.(*Command).ExecuteC|github.com/spf13/cobra.(*Command).ExecuteC>
	<http://github.com/spf13/cobra@v1.2.1/command.go:974|github.com/spf13/cobra@v1.2.1/command.go:974>
<http://github.com/spf13/cobra.(*Command).Execute|github.com/spf13/cobra.(*Command).Execute>
	<http://github.com/spf13/cobra@v1.2.1/command.go:902|github.com/spf13/cobra@v1.2.1/command.go:902>
main.main
	<http://helm.sh/helm/v3/cmd/helm/helm.go:87|helm.sh/helm/v3/cmd/helm/helm.go:87>
runtime.main
	runtime/proc.go:225
runtime.goexit
	runtime/asm_amd64.s:1371
Copy code
chylek@ubuntu:~$ kubectl get pods -A
NAMESPACE       NAME                                                   READY   STATUS                  RESTARTS   AGE
kube-system     helm-install-traefik-crd-6m762                         0/1     Completed               0          4d15h
kube-system     helm-install-traefik-ltdsg                             0/1     Completed               1          4d15h
ingress-nginx   svclb-ingress-nginx-controller-99jmv                   0/2     Pending                 0          4d14h
ingress-nginx   svclb-ingress-nginx-controller-pphnj                   0/2     Pending                 0          4d14h
ingress-nginx   svclb-ingress-nginx-controller-vdr6q                   0/2     Pending                 0          4d14h
ingress-nginx   svclb-ingress-nginx-controller-r9lmt                   0/2     Pending                 0          4d14h
pulsar          pulsar-bookie-init-hkjm4                               0/1     Completed               0          4d14h
pulsar          pulsar-pulsar-init-sqmwt                               0/1     Completed               0          4d14h
kube-system     svclb-traefik-vqd9b                                    2/2     Running                 4          4d15h
kube-system     local-path-provisioner-5ff76fc89d-grqwm                1/1     Running                 6          4d15h
kube-system     svclb-traefik-vddkf                                    2/2     Running                 4          4d15h
kube-system     svclb-traefik-f6bm7                                    2/2     Running                 4          4d15h
kube-system     metrics-server-86cbb8457f-bsgc2                        1/1     Running                 2          4d15h
kube-system     coredns-7448499f4d-6p875                               1/1     Running                 2          4d15h
kube-system     svclb-traefik-9jl5x                                    2/2     Running                 4          4d15h
ingress-nginx   ingress-nginx-controller-5c8d66c76d-zdld5              1/1     Running                 1          4d14h
pulsar          pulsar-zookeeper-2                                     1/1     Running                 1          3h54m
kube-system     traefik-97b44b794-rjwl6                                1/1     Running                 2          4d15h
pulsar          pulsar-recovery-0                                      1/1     Running                 1          4d14h
ingress-nginx   ingress-nginx-controller-5c8d66c76d-4qckr              1/1     Running                 1          4d14h
pulsar          pulsar-zookeeper-1                                     1/1     Running                 1          3h55m
pulsar          pulsar-zookeeper-0                                     1/1     Running                 1          4d14h
pulsar          pulsar-bookie-1                                        1/1     Running                 1          4d14h
pulsar          pulsar-bookie-0                                        1/1     Running                 1          4d14h
pulsar          pulsar-broker-0                                        1/1     Running                 1          4d14h
pulsar          pulsar-broker-1                                        1/1     Running                 1          4d14h
pulsar          pulsar-bookie-2                                        1/1     Running                 1          4d14h
gooddata-cn     gooddata-cn-result-cache-85558b84fb-rjws8              0/1     ContainerCreating       0          15m
gooddata-cn     gooddata-cn-metadata-api-7b94f9778d-2w66d              0/1     Init:ImagePullBackOff   0          15m
gooddata-cn     gooddata-cn-metadata-api-7b94f9778d-gn8w8              0/1     Init:ImagePullBackOff   0          15m
gooddata-cn     gooddata-cn-sql-executor-69fd9f559f-42g6n              0/1     Init:ImagePullBackOff   0          15m
gooddata-cn     gooddata-cn-scan-model-6dc5cfb9dc-b8tk7                0/1     ImagePullBackOff        0          15m
gooddata-cn     gooddata-cn-dex-5c985fbf98-hxt9c                       0/1     Init:ImagePullBackOff   0          15m
gooddata-cn     gooddata-cn-scan-model-6dc5cfb9dc-v4pph                0/1     ImagePullBackOff        0          15m
gooddata-cn     gooddata-cn-auth-service-6f487cfbff-qgngc              0/1     ImagePullBackOff        0          15m
gooddata-cn     gooddata-cn-sql-executor-69fd9f559f-nxqms              0/1     Init:ImagePullBackOff   0          15m
gooddata-cn     gooddata-cn-dex-5c985fbf98-kb6m6                       0/1     Init:ImagePullBackOff   0          15m
gooddata-cn     gooddata-cn-afm-exec-api-f4dc5dbfd-78xkh               0/1     ImagePullBackOff        0          15m
gooddata-cn     gooddata-cn-measure-editor-7c4f79d5d-47bpj             1/1     Running                 0          15m
gooddata-cn     gooddata-cn-measure-editor-7c4f79d5d-qk4tn             1/1     Running                 0          15m
gooddata-cn     gooddata-cn-auth-service-6f487cfbff-j4mfs              0/1     ImagePullBackOff        0          15m
gooddata-cn     gooddata-cn-redis-ha-server-0                          3/3     Running                 0          15m
gooddata-cn     gooddata-cn-afm-exec-api-f4dc5dbfd-8gmxr               0/1     ImagePullBackOff        0          15m
gooddata-cn     gooddata-cn-apidocs-67686f7694-st76j                   1/1     Running                 0          15m
gooddata-cn     gooddata-cn-ldm-modeler-5b778b998d-ckh8m               1/1     Running                 0          15m
gooddata-cn     gooddata-cn-dashboards-57d48bbc84-292b6                1/1     Running                 0          15m
gooddata-cn     gooddata-cn-dashboards-57d48bbc84-zscdl                1/1     Running                 0          15m
gooddata-cn     gooddata-cn-ldm-modeler-5b778b998d-h4dhv               1/1     Running                 0          15m
gooddata-cn     gooddata-cn-home-ui-759fcf7d4-5g868                    1/1     Running                 0          15m
gooddata-cn     gooddata-cn-home-ui-759fcf7d4-g2l92                    1/1     Running                 0          15m
gooddata-cn     gooddata-cn-aqe-5d9b68f586-kk2t7                       1/1     Running                 0          15m
gooddata-cn     gooddata-cn-analytical-designer-54646887df-s88n8       1/1     Running                 0          15m
gooddata-cn     gooddata-cn-apidocs-67686f7694-dkqlb                   1/1     Running                 0          15m
gooddata-cn     gooddata-cn-result-cache-85558b84fb-hpczg              0/1     ImagePullBackOff        0          15m
gooddata-cn     gooddata-cn-redis-ha-server-1                          3/3     Running                 0          9m39s
gooddata-cn     gooddata-cn-db-postgresql-1                            0/2     PodInitializing         0          15m
gooddata-cn     gooddata-cn-db-postgresql-0                            0/2     PodInitializing         0          15m
gooddata-cn     gooddata-cn-organization-controller-67c7d99d55-8znr9   1/1     Running                 0          15m
gooddata-cn     gooddata-cn-organization-controller-67c7d99d55-ktn8l   1/1     Running                 0          15m
gooddata-cn     gooddata-cn-analytical-designer-54646887df-5d9pl       1/1     Running                 0          15m
gooddata-cn     gooddata-cn-redis-ha-server-2                          3/3     Running                 0          5m49s
gooddata-cn     gooddata-cn-aqe-5d9b68f586-brtnc                       1/1     Running                 0          15m
gooddata-cn     gooddata-cn-tools-7dd9c565d9-7srq2                     1/1     Running                 0          15m
gooddata-cn     gooddata-cn-db-pgpool-84fc646558-jzshj                 0/1     Running                 4          15m
gooddata-cn     gooddata-cn-db-pgpool-84fc646558-dnsnj                 1/1     Running                 4          15m
r
@Daniel Chýlek I prepared a simple script that should install gooddata.cn in k3d: https://github.com/mouchar/gooddata-cn-tools/tree/master/k3d
👏 3
Please let me know if it works for you. It's been developed and tested on Ubuntu, but on other Linux distros it should work as well. Not sure about OSX or Windows/WSL
d
Thank you, It appears in k3d 5.0 there is no
--k3s-server-arg
. Changing it to
--k3s-arg
gave me another error:
Copy code
FATA[0000] K3sExtraArg '--no-deploy=traefik' lacks a node filter, but there's more than one node
I ended up removing the argument altogether, which is probably a bad idea. (*) The script ended with
Copy code
Running: helm -n gooddata upgrade --install gooddata-cn --wait --timeout 7m --values /tmp/values-gooddata-cn.yaml --version 1.4.0 gooddata/gooddata-cn
Release "gooddata-cn" does not exist. Installing it now.
W1102 19:33:54.366190   93421 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
W1102 19:34:04.959222   93421 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
Error: context deadline exceeded
It seems most services are running, but both :80 and :443 respond with a 404 error.
Copy code
chylek@ubuntu:~$ kubectl get pods -A
NAMESPACE      NAME                                                  READY   STATUS              RESTARTS   AGE
kube-system    helm-install-traefik-crd-wjwd4                        0/1     Completed           0          72m
kube-system    helm-install-cert-manager-9c8zw                       0/1     Completed           0          72m
kube-system    helm-install-traefik-2lln4                            0/1     Completed           0          72m
kube-system    svclb-ingress-nginx-controller-r8tq2                  0/2     Pending             0          68m
kube-system    svclb-ingress-nginx-controller-qb9hh                  0/2     Pending             0          68m
kube-system    svclb-ingress-nginx-controller-z2k4r                  0/2     Pending             0          68m
kube-system    helm-install-ingress-nginx-qggz7                      0/1     Completed           0          72m
pulsar         pulsar-bookie-init-jt59f                              0/1     Completed           0          33m
pulsar         pulsar-pulsar-init-scbsw                              0/1     Completed           0          33m
gooddata       gooddata-cn-apidocs-594d6dbf44-h2dp6                  1/1     Running             1          26m
kube-system    svclb-traefik-9tw5p                                   2/2     Running             2          69m
kube-system    ingress-nginx-controller-6d64b8fb47-wm2jq             1/1     Running             1          68m
kube-system    metrics-server-86cbb8457f-k7cmk                       1/1     Running             2          72m
pulsar         pulsar-zookeeper-2                                    1/1     Running             1          27m
gooddata       gooddata-cn-redis-ha-server-0                         3/3     Running             3          26m
gooddata       gooddata-cn-auth-service-8654cbff5d-t54xj             1/1     Running             1          26m
gooddata       gooddata-cn-db-postgresql-0                           2/2     Running             2          26m
gooddata       gooddata-cn-tools-67fbf64b9f-qltt5                    1/1     Running             1          25m
gooddata       gooddata-cn-redis-ha-server-2                         3/3     Running             3          22m
gooddata       gooddata-cn-redis-ha-server-1                         3/3     Running             3          23m
pulsar         pulsar-zookeeper-1                                    1/1     Running             1          28m
gooddata       gooddata-cn-afm-exec-api-7bf559b487-4x2rv             1/1     Running             1          26m
gooddata       gooddata-cn-dex-647658ccd-rzbs5                       1/1     Running             1          26m
gooddata       gooddata-cn-dex-647658ccd-25bmm                       1/1     Running             1          26m
pulsar         pulsar-zookeeper-0                                    1/1     Running             1          33m
gooddata       gooddata-cn-analytical-designer-794497c4f-4xrbj       1/1     Running             1          25m
pulsar         pulsar-recovery-0                                     1/1     Running             1          33m
cert-manager   cert-manager-cainjector-86bc6dc648-tvjjx              1/1     Running             3          69m
gooddata       gooddata-cn-dashboards-84c85df6db-zkfvq               1/1     Running             1          26m
kube-system    coredns-7448499f4d-c6ssc                              1/1     Running             1          72m
gooddata       gooddata-cn-ldm-modeler-59ddbdc696-d6hkf              1/1     Running             1          26m
cert-manager   cert-manager-bf6c77cbc-svcgn                          1/1     Running             1          69m
gooddata       gooddata-cn-apidocs-594d6dbf44-btvlj                  1/1     Running             1          26m
kube-system    svclb-traefik-2lmks                                   2/2     Running             2          69m
gooddata       gooddata-cn-aqe-84998b8596-bxlnk                      1/1     Running             1          26m
gooddata       gooddata-cn-measure-editor-595d576df4-gtccc           1/1     Running             1          26m
gooddata       gooddata-cn-home-ui-68d6766f84-85hgn                  1/1     Running             1          26m
kube-system    local-path-provisioner-5ff76fc89d-d6k7v               1/1     Running             3          72m
gooddata       gooddata-cn-scan-model-58878777bd-x2hxs               1/1     Running             1          26m
gooddata       gooddata-cn-organization-controller-fcd74d885-d7b87   1/1     Running             1          26m
cert-manager   cert-manager-webhook-78b6f5dfcc-xs66l                 1/1     Running             1          69m
kube-system    svclb-traefik-slxtx                                   2/2     Running             2          69m
kube-system    traefik-97b44b794-bmzgd                               1/1     Running             1          69m
gooddata       gooddata-cn-result-cache-fc48f7bf8-lkfcw              1/1     Running             1          26m
gooddata       gooddata-cn-db-postgresql-1                           2/2     Running             2          26m
gooddata       gooddata-cn-db-pgpool-c8c4b9878-d89pd                 1/1     Running             3          26m
gooddata       gooddata-cn-db-pgpool-c8c4b9878-79jc9                 1/1     Running             2          26m
gooddata       gooddata-cn-sql-executor-7d58bc465c-cwd5f             1/1     Running             1          26m
gooddata       gooddata-cn-metadata-api-6f7fdf5557-9zdj2             1/1     Running             1          26m
pulsar         pulsar-bookie-2                                       0/1     CrashLoopBackOff    6          33m
pulsar         pulsar-bookie-1                                       0/1     CrashLoopBackOff    6          33m
pulsar         pulsar-bookie-0                                       0/1     CrashLoopBackOff    6          33m
pulsar         pulsar-broker-1                                       1/1     Running             3          33m
pulsar         pulsar-broker-0                                       1/1     Running             3          33m
gooddata       gooddata-cn-cache-gc-27264720-v84m2                   0/1     ContainerCreating   0          6s
(*) EDIT: I'm looking into it now that I have more time and figuring out what's the correct new format. EDIT2: I cannot find good documentation on the new format, the official usage docs aren't even consistent... tried a few things and they didn't work, so I'm downgrading k3d to 4.x and trying again.
After downgrading and reinstalling from scratch, now even pulsar is failing with the timeout message 😕 guess I'll run the script again, maybe something is just taking too much time Looks like my computer ran out of space, too bad there's not a better error message for that :D
Unfortunately downgrading k3d to 4.x also didn't help, still getting a context deadline error:
Copy code
Running: helm -n pulsar upgrade --wait --timeout 7m --install pulsar --values /tmp/values-pulsar-k3d.yaml --set initialize=true --version 2.7.2 <https://github.com/apache/pulsar-helm-chart/releases/download/pulsar-2.7.2/pulsar-2.7.2.tgz>
Release "pulsar" does not exist. Installing it now.
W1103 07:57:27.103143  188603 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1103 07:57:27.108070  188603 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1103 07:57:27.111029  188603 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1103 07:57:27.128958  188603 warnings.go:70] <http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1> ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1> ClusterRole
W1103 07:57:27.197557  188603 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1103 07:57:27.201892  188603 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1103 07:57:27.203011  188603 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1103 07:57:27.247392  188603 warnings.go:70] <http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1> ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1> ClusterRole
NAME: pulsar
LAST DEPLOYED: Wed Nov  3 07:57:25 2021
NAMESPACE: pulsar
STATUS: deployed
REVISION: 1
TEST SUITE: None
Running: helm -n gooddata upgrade --install gooddata-cn --wait --timeout 7m --values /tmp/values-gooddata-cn.yaml --version 1.4.0 gooddata/gooddata-cn
Release "gooddata-cn" does not exist. Installing it now.
W1103 08:01:33.318907  204576 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
W1103 08:01:46.093635  204576 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
Error: rate: Wait(n=1) would exceed context deadline
Copy code
chylek@ubuntu:~/gd/k3d$ kubectl get pods -A
NAMESPACE      NAME                                                  READY   STATUS             RESTARTS   AGE
kube-system    local-path-provisioner-5ff76fc89d-n4z5g               1/1     Running            0          28m
kube-system    metrics-server-86cbb8457f-prjqm                       1/1     Running            0          28m
kube-system    coredns-7448499f4d-zxdrw                              1/1     Running            0          28m
kube-system    helm-install-cert-manager-bgh48                       0/1     Completed          0          28m
cert-manager   cert-manager-webhook-78b6f5dfcc-crwv2                 1/1     Running            0          27m
cert-manager   cert-manager-bf6c77cbc-wg4fx                          1/1     Running            0          27m
kube-system    helm-install-ingress-nginx-7wbwh                      0/1     Completed          0          28m
kube-system    svclb-ingress-nginx-controller-57544                  2/2     Running            0          27m
kube-system    svclb-ingress-nginx-controller-mbjvc                  2/2     Running            0          27m
kube-system    svclb-ingress-nginx-controller-jbgkn                  2/2     Running            0          27m
kube-system    ingress-nginx-controller-6d64b8fb47-t9bs7             1/1     Running            0          27m
pulsar         pulsar-zookeeper-0                                    1/1     Running            0          27m
pulsar         pulsar-zookeeper-1                                    1/1     Running            0          25m
pulsar         pulsar-zookeeper-2                                    1/1     Running            0          24m
pulsar         pulsar-bookie-init-xfww2                              0/1     Completed          0          27m
pulsar         pulsar-recovery-0                                     1/1     Running            0          27m
pulsar         pulsar-pulsar-init-9p695                              0/1     Completed          0          27m
pulsar         pulsar-bookie-1                                       1/1     Running            0          27m
pulsar         pulsar-bookie-0                                       1/1     Running            0          27m
pulsar         pulsar-bookie-2                                       1/1     Running            0          27m
pulsar         pulsar-broker-0                                       1/1     Running            0          27m
pulsar         pulsar-broker-1                                       1/1     Running            0          27m
gooddata       gooddata-cn-measure-editor-595d576df4-rkwhf           1/1     Running            0          22m
gooddata       gooddata-cn-ldm-modeler-59ddbdc696-knmpn              1/1     Running            0          22m
gooddata       gooddata-cn-apidocs-594d6dbf44-4zjdj                  1/1     Running            0          22m
cert-manager   cert-manager-cainjector-86bc6dc648-pqx9g              1/1     Running            1          27m
gooddata       gooddata-cn-tools-67fbf64b9f-gxjzn                    1/1     Running            0          22m
gooddata       gooddata-cn-metadata-api-6f7fdf5557-6gr7n             0/1     Init:0/1           0          22m
gooddata       gooddata-cn-home-ui-68d6766f84-lct57                  1/1     Running            0          22m
gooddata       gooddata-cn-dashboards-84c85df6db-5fw96               1/1     Running            0          22m
gooddata       gooddata-cn-aqe-84998b8596-7hfc6                      1/1     Running            0          22m
gooddata       gooddata-cn-analytical-designer-794497c4f-p5sgn       1/1     Running            0          22m
gooddata       gooddata-cn-apidocs-594d6dbf44-24fh8                  1/1     Running            0          22m
gooddata       gooddata-cn-organization-controller-fcd74d885-rgwmg   1/1     Running            0          22m
gooddata       gooddata-cn-sql-executor-7d58bc465c-hpswt             0/1     Init:0/1           0          22m
gooddata       gooddata-cn-dex-647658ccd-rcch2                       0/1     Init:0/1           0          22m
gooddata       gooddata-cn-dex-647658ccd-vxprl                       0/1     Init:0/1           0          22m
gooddata       gooddata-cn-result-cache-fc48f7bf8-7d7h4              1/1     Running            0          22m
gooddata       gooddata-cn-auth-service-8654cbff5d-ct57l             1/1     Running            0          22m
gooddata       gooddata-cn-afm-exec-api-7bf559b487-r2lgx             1/1     Running            0          22m
gooddata       gooddata-cn-scan-model-58878777bd-ww2v2               1/1     Running            0          22m
gooddata       gooddata-cn-db-postgresql-0                           1/2     CrashLoopBackOff   8          22m
gooddata       gooddata-cn-db-pgpool-c8c4b9878-nb2h5                 0/1     CrashLoopBackOff   9          22m
gooddata       gooddata-cn-db-pgpool-c8c4b9878-gkkdl                 0/1     CrashLoopBackOff   9          22m
gooddata       gooddata-cn-redis-ha-server-0                         0/3     Init:Error         9          22m
gooddata       gooddata-cn-db-postgresql-1                           1/2     CrashLoopBackOff   9          22m
r
Can you please share information about the HW where you're running the script? I'm notably interested in number of CPU cores and memory size.
The
CrashLoopBackOff
and
Init:Error
from pods with volumes suggest there is some issue with data persistence.
d
I'm running Ubuntu in VirtualBox, with 16 GB RAM, 6 cores, and 150 GB disk. I can give it up to 48 GB RAM if needed, but I'm maxed out on cores and the disk usage is reported at 42 GB so there should not be an issue there.
r
well, this should be enough. Maybe there were some remainders from the previous attempt with k3d 5.x. Please clean up the environment:
Copy code
docker network disconnect k3d-default k3d-registry
k3d cluster delete default
docker rm -f k3d-registry
docker volume rm registry-data
docker system prune -a -f --volumes
And then run the script again. It will preserve the CA cert, but the rest will be recreated. I can try locally with VBox, didn't tested it yet this way.
What Ubuntu version are you using? And how do you start it - directly in VBox, or using multipass?
d
I'll try, thanks. I start it directly from VBox, using Ubuntu 20.04.3 LTS. The host machine is an Intel Mac.
Various things keep timing out, once it was pulsar, once it was cert-manager... is there any info on what exactly is timing out?
Copy code
Waiting for cert-manager to come up
deployment.apps/cert-manager condition met
timed out waiting for the condition on deployments/cert-manager-webhook
timed out waiting for the condition on deployments/cert-manager-cainjector
r
the script tests all the Deployments that are a part of the cert-manager helm chart. It's really surprising that it times out at such an early stage
This is what script does:
Copy code
kubectl -n cert-manager wait deployment --for=condition=available --selector=<http://app.kubernetes.io/instance=cert-manager|app.kubernetes.io/instance=cert-manager>
It's a basically script-friendly version of
kubectl -n cert-manager get deployment
default timeout is 30s, it should be enough for such a simple application like cert-manager
d
If I run
kubectl get pods -A
, it shows that cert-manager-webhook and cert-manager-cainjector are both running, so I don't know if they took longer than expected, or if whatever is checking the timeout has errored out and couldn't get the correct state of the pods...
r
In the meantime, I'm trying to simulate your env (vbox w/ubuntu, 6vCPU, 10GB RAM), and the VM overhead is way too high
so everything runs much slower than I expected
d
Perhaps it would run better if I installed it in a docker ubuntu container instead of a VM?
r
do you have docker installed directly on your mac book? maybe the apple's virtualization might be more efficient than virtualbox
I do not recommend running ubuntu in docker, where you would start docker with kubernetes that would run yet another docker layer. docker-in-docker-in-docker 😕
If you were running docker-desktop directly on macos, you could allocate sufficient resources to docker VM (16 GB RAM, 6 cores, the same as you did to Virtualbox) and run the script directly in terminal
d
I wanted to avoid running K8s directly on my machine, I'd like to be able to wipe it out completely and start over if something goes wrong without having to hunt down all the places it touched.
r
it will NOT run directly on your machine
k3d runs within docker
the only artifact (except the four docker containers) that will remain on your host, is $HOME/.kube/config file. Nothing else
d
ok
r
sorry, five containers:
Copy code
ubuntu@ubuntu2004:~/gooddata-cn-tools/k3d$ docker ps
CONTAINER ID   IMAGE                      COMMAND                  CREATED       STATUS       
b25f05bd623f   rancher/k3d-proxy:4.4.8    "/bin/sh -c nginx-pr…"   2 hours ago   Up 2 hours   
4d85b08a51bb   rancher/k3s:v1.21.3-k3s1   "/bin/k3s agent"         2 hours ago   Up 2 hours   
5cba8cd5bc3d   rancher/k3s:v1.21.3-k3s1   "/bin/k3s agent"         2 hours ago   Up 2 hours   
f41cb0f08b10   rancher/k3s:v1.21.3-k3s1   "/bin/k3s server --n…"   2 hours ago   Up 2 hours   
12af2571cacc   registry:2                 "/entrypoint.sh /etc…"   2 hours ago   Up 29 minutes
and you may always simply wipe the whole cluster by
k3d cluster delete default
as described above. The registry remains intentionally, because it holds cached images for faster future deployments
d
port 5000 is not a good default for the registry on mac, that port is already used by the OS (same on Windows, apparently)
r
it should not be an issue to change.
the worse thing is that in k3d 4.4.8 is an error (actually in k3s) that breaks some non-root containers (like postgres or redis) because of volume permission issue. The fix is fairly simple.
I will also remap the port 5000 somewhere else. Will the 5050 work for you?
d
I have disabled AirPlay which was using the port just to avoid any complications, but for the future 5050 sounds good. Please let me know about the volume permission issue, since I installed 4.4.8; unfortunately I'm still running into this context error:
Copy code
Release "gooddata-cn" does not exist. Installing it now.
W1103 16:30:31.797746   26278 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
W1103 16:30:40.318584   26278 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
Error: rate: Wait(n=1) would exceed context deadline
These are my Docker Desktop settings:
r
I updated the script in repo. Now I'm running test in my vbox
pulsar timed out, but it was deployed properly. now it installs gooddata-cn, so far no errors and volumes were configured properly
so
--timeout 7m
for pulsar deployment is not sufficient for virtualbox.
d
This is the state of pods after I ran your script on my Mac:
Copy code
chylek@Daniel-iMac k3d % kubectl get pods -A
NAMESPACE      NAME                                                  READY   STATUS                  RESTARTS   AGE
kube-system    coredns-7448499f4d-k4fxk                              1/1     Running                 0          60m
kube-system    local-path-provisioner-5ff76fc89d-rj987               1/1     Running                 0          60m
kube-system    metrics-server-86cbb8457f-6q767                       1/1     Running                 0          60m
kube-system    helm-install-cert-manager-5s5rv                       0/1     Completed               0          60m
kube-system    helm-install-ingress-nginx-6txq9                      0/1     Completed               0          60m
cert-manager   cert-manager-cainjector-86bc6dc648-6pxfb              1/1     Running                 0          59m
kube-system    svclb-ingress-nginx-controller-49z5k                  2/2     Running                 0          59m
kube-system    svclb-ingress-nginx-controller-l2h7q                  2/2     Running                 0          59m
kube-system    svclb-ingress-nginx-controller-gld4j                  2/2     Running                 0          59m
cert-manager   cert-manager-webhook-78b6f5dfcc-jzj9g                 1/1     Running                 0          59m
cert-manager   cert-manager-bf6c77cbc-2zlq7                          1/1     Running                 0          59m
kube-system    ingress-nginx-controller-6d64b8fb47-mxh7n             1/1     Running                 0          59m
pulsar         pulsar-zookeeper-0                                    1/1     Running                 0          51m
pulsar         pulsar-zookeeper-1                                    1/1     Running                 0          49m
pulsar         pulsar-bookie-init-ckz4s                              0/1     Completed               0          51m
pulsar         pulsar-recovery-0                                     1/1     Running                 0          51m
pulsar         pulsar-pulsar-init-6m7r6                              0/1     Completed               0          51m
pulsar         pulsar-zookeeper-2                                    1/1     Running                 0          48m
pulsar         pulsar-bookie-0                                       1/1     Running                 0          51m
pulsar         pulsar-bookie-1                                       1/1     Running                 0          51m
pulsar         pulsar-bookie-2                                       1/1     Running                 0          51m
pulsar         pulsar-broker-0                                       1/1     Running                 0          51m
pulsar         pulsar-broker-1                                       1/1     Running                 0          51m
gooddata       gooddata-cn-measure-editor-595d576df4-qmgzv           1/1     Running                 0          47m
gooddata       gooddata-cn-aqe-84998b8596-f494v                      1/1     Running                 0          47m
gooddata       gooddata-cn-home-ui-68d6766f84-v8fxk                  1/1     Running                 0          47m
gooddata       gooddata-cn-analytical-designer-794497c4f-jlpvm       1/1     Running                 0          47m
gooddata       gooddata-cn-ldm-modeler-59ddbdc696-6xtzr              1/1     Running                 0          47m
gooddata       gooddata-cn-apidocs-594d6dbf44-6g6w9                  1/1     Running                 0          47m
gooddata       gooddata-cn-apidocs-594d6dbf44-4cvgm                  1/1     Running                 0          47m
gooddata       gooddata-cn-dashboards-84c85df6db-l9vp8               1/1     Running                 0          47m
gooddata       gooddata-cn-organization-controller-fcd74d885-jqqkm   1/1     Running                 0          47m
gooddata       gooddata-cn-tools-67fbf64b9f-jspmd                    1/1     Running                 0          47m
gooddata       gooddata-cn-dex-647658ccd-s2tk5                       0/1     Init:0/1                0          47m
gooddata       gooddata-cn-sql-executor-7d58bc465c-zxn66             0/1     Init:0/1                0          47m
gooddata       gooddata-cn-scan-model-58878777bd-rpfqv               1/1     Running                 0          47m
gooddata       gooddata-cn-auth-service-8654cbff5d-sbb4x             1/1     Running                 0          47m
gooddata       gooddata-cn-metadata-api-6f7fdf5557-qhltg             0/1     Init:0/1                0          47m
gooddata       gooddata-cn-dex-647658ccd-z29qx                       0/1     Init:0/1                0          47m
gooddata       gooddata-cn-afm-exec-api-7bf559b487-gbjpf             1/1     Running                 0          47m
gooddata       gooddata-cn-result-cache-fc48f7bf8-j8z2x              1/1     Running                 0          47m
gooddata       gooddata-cn-cache-gc-27265920-kvvzj                   0/1     Completed               0          18m
gooddata       gooddata-cn-db-postgresql-1                           1/2     CrashLoopBackOff        13         47m
gooddata       gooddata-cn-db-postgresql-0                           1/2     CrashLoopBackOff        13         47m
gooddata       gooddata-cn-db-pgpool-c8c4b9878-hvrj5                 0/1     CrashLoopBackOff        15         47m
gooddata       gooddata-cn-redis-ha-server-0                         0/3     Init:CrashLoopBackOff   14         47m
gooddata       gooddata-cn-db-pgpool-c8c4b9878-9v5sd                 0/1     Running                 16         47m
Just like in the VM, connecting to localhost shows me a 404.
r
did you used the updated script? what version of k3s image is reported when you run
docker ps
?
It shoud be
rancher/k3s:v1.21.4-k3s1
if it is older (like
rancher/k3s:v1.21.*3*-k3s1
) please update script from repo
d
It is .3, I will update and try again.
r
I apologize for complications - I run older k3d 4.4.4 locally and it works as a charm. Recent updates in k3s and k3d introduced these errors I was not aware of.
before you start updated script, please run
k3d cluster delete default
d
The updated script is missing the 'p' argument in getopts:
Copy code
- while getopts "cH:" o; do
+ while getopts "cH:p:" o; do
No worries about the complications, I appreciate your help
r
ha thanks! I always forgot to update getopts line when I add a new argument 🙂
fyi, I successfully (just with a single timeout on pulsar helm install) installed the gooddata.cn , created org and UI seems to be working:
d
Mine has also finished installing successfully, so it looks like the main problem was the K3D version. Thanks for help!
👏 1
Hi, we're having an issue trying to use the same installation process using your scripts on our testing server. For some reason the kernel keeps killing nginx processes so the installation never finishes:
Copy code
Nov 10 14:29:45 testing kernel: [1549711.046901] Memory cgroup out of memory: Kill process 2301795 (nginx) score 1277 or sacrifice child
Nov 10 14:29:45 testing kernel: [1549711.046907] Killed process 2303262 (nginx) total-vm:10948kB, anon-rss:1300kB, file-rss:1988kB, shmem-rss:0kB
Nov 10 14:29:45 testing kernel: [1549711.055984] oom_reaper: reaped process 2303262 (nginx), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Nov 10 14:29:45 testing kernel: [1549711.198409] nginx invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=999
Nov 10 14:29:45 testing kernel: [1549711.198411] nginx cpuset=0648fca50b349420d4104fa001615ee60034c1d18518ebd907f94f7e4829c40b mems_allowed=0-1

(...)

Nov 10 14:29:45 testing kernel: [1549711.198484] Task in /docker/becbdc9b7521b71a28f4cc8ec213513ffad37421009ae25240a41c43f397c5e2/kubepods/burstable/pod078e9da9-37aa-4216-b9b0-20072792ac17/0648fca50b349420d4104fa001615ee60034c1d18518ebd907f94f7e4829c40b killed as a result of limit of /docker/becbdc9b7521b71a28f4cc8ec213513ffad37421009ae25240a41c43f397c5e2/kubepods/burstable/pod078e9da9-37aa-4216-b9b0-20072792ac17
Nov 10 14:29:45 testing kernel: [1549711.198492] memory: usage 20480kB, limit 20480kB, failcnt 30832
Nov 10 14:29:45 testing kernel: [1549711.198493] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Nov 10 14:29:45 testing kernel: [1549711.198494] kmem: usage 17820kB, limit 9007199254740988kB, failcnt 0
Nov 10 14:29:45 testing kernel: [1549711.198495] Memory cgroup stats for /docker/becbdc9b7521b71a28f4cc8ec213513ffad37421009ae25240a41c43f397c5e2/kubepods/burstable/pod078e9da9-37aa-4216-b9b0-20072792ac17: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 10 14:29:45 testing kernel: [1549711.198501] Memory cgroup stats for /docker/becbdc9b7521b71a28f4cc8ec213513ffad37421009ae25240a41c43f397c5e2/kubepods/burstable/pod078e9da9-37aa-4216-b9b0-20072792ac17/5f517b59efb059ee85a5946d70ba12167613275c425de78378dcfd4c7caeaf3b: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:44KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 10 14:29:45 testing kernel: [1549711.198507] Memory cgroup stats for /docker/becbdc9b7521b71a28f4cc8ec213513ffad37421009ae25240a41c43f397c5e2/kubepods/burstable/pod078e9da9-37aa-4216-b9b0-20072792ac17/0e0838ef9bef7ed29e5e59de946c2afa1354210f097f65defc5f36ba12359b4d: cache:0KB rss:124KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 10 14:29:45 testing kernel: [1549711.198513] Memory cgroup stats for /docker/becbdc9b7521b71a28f4cc8ec213513ffad37421009ae25240a41c43f397c5e2/kubepods/burstable/pod078e9da9-37aa-4216-b9b0-20072792ac17/77c0aa7d56011f59b60b50b5a663cc96b7bee931d1586ebc22408527df5f2d2f: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 10 14:29:45 testing kernel: [1549711.198519] Memory cgroup stats for /docker/becbdc9b7521b71a28f4cc8ec213513ffad37421009ae25240a41c43f397c5e2/kubepods/burstable/pod078e9da9-37aa-4216-b9b0-20072792ac17/0648fca50b349420d4104fa001615ee60034c1d18518ebd907f94f7e4829c40b: cache:0KB rss:2292KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:2504KB inactive_file:36KB active_file:0KB unevictable:0KB
When checking
docker stats
, all k3d containers report the memory limit as over 700 GB, so we're definitely not hitting that. Checking
/sys/fs/cgroup/memory/memory.limit_in_bytes
reports 64-bit max value, so there's no limit set there either. Checking
/sys/fs/cgroup/memory/docker/<hash>/kubepods/burstable/<hash>
shows a limit of 20480 kB this is hitting, but we don't know why that limit even exists on the testing server, or how to get rid of it. Any ideas?
r
Hi Dan, what linux distro, kernel version and docker version you're using? There were some recent changes in cgroup v2 that could be incompatible with your setup
ok, after more investigation, I have found that three of our services have resources.limits.memory set to 20Mi (dashboards, home-ui and measure-editor). It's really surprising they won't fit to 20MB on your machine. Usually the consume less than 5MB:
Copy code
kubectl top pod -n gooddata 
NAME                                                  CPU(cores)   MEMORY(bytes)   
gooddata-cn-dashboards-84c85df6db-w2c5q               1m           3Mi             
gooddata-cn-home-ui-68d6766f84-p2cc9                  1m           4Mi             
gooddata-cn-measure-editor-595d576df4-nsvc5           1m           3Mi
You may increase resource limits by adding the following yaml snippet to `values-gooddata-cn.yaml`:
Copy code
measureEditor:
  resources:
    limits:
      memory: 30Mi
homeUi:
  resources:
    limits:
      memory: 30Mi
dashboards:
  resources:
    limits:
      memory: 30Mi
(you may experiment with the limit size, 30Mi is just an example)
d
this is what we're seeing
Copy code
NAME CPU(cores) MEMORY(bytes)

gooddata-cn-dashboards-84c85df6db-96624     5314m 17Mi
gooddata-cn-measure-editor-595d576df4-4rd6h    0m 17Mi
gooddata-cn-home-ui-68d6766f84-8fsv2           1m 17Mi
which is strange considering it's a completely fresh installation
Copy code
Ubuntu 20.04.3 LTS (GNU/Linux 4.15.0-161-generic x86_64)
Docker version 20.10.7, build 20.10.7-0ubuntu5~20.04.1
it looked like it started with the higher limits, but now kubectl died with "context deadline exceeded" 😕
r
dashboards consuming 5 CPUs is really suspicious, considering the fact it's just an unprivileged nginx image serving bunch of static files. Did all the remaining pods started and are healthy? There might be some pod restarts during the deployment phase (but number of restarts should be less than 3 and must not grow). context deadline exceeded is Go's ugly name for timeout. It seems the kubernetes api is overloaded
Does the situation as shown above (cpu-hogging dashboards pod) still persists?
d
It looks like GD has loaded, but it doesn't have the correct port; we set LBPORT and LBSSLPORT to numbers in the 10000 range, but when we access the site, it redirects to localhost/dex/auth with no port EDIT: Realized there is a parameter for this; assuming that it's supposed to have the port, it'd be nice if it defaulted to
localhost:${LBSSLPORT}
unless there is some problem with that?
So, apparently gooddata-cn-dex does not support a custom port in authHost, and I cannot find any documentation for the format of the dex section in yaml
r
You're right, I didn't considered this option, I was always happy with load balancer running on 443 😉 I didn't tested yet, but this could fix it:
Copy code
--- a/k3d/k3d.sh
+++ b/k3d/k3d.sh
@@ -278,6 +278,7 @@ cookiePolicy: None
 dex:
   ingress:
     authHost: $authHost
+    lbPort: ${LBSSLPORT:+:$LBSSLPORT}
     annotations:
       <http://cert-manager.io/issuer|cert-manager.io/issuer>: ca-issuer
     tls:
I will check later
d
Unfortunately it's still redirecting to localhost with no port
r
ah, my bad, the value is invalid.
${LBSSLPORT:+$LBSSLPORT}
d
I forgot to mention, I also tried just
${LBSSLPORT}
since I wasn't familiar with this particular bash syntax, but both
${LBSSLPORT}
and
${LBSSLPORT:+$LBSSLPORT}
are still redirecting to localhost with no port; also tried with browser cache disabled
r
Are you talking about a fresh deployment with updated script, or did you used the script to update existing deployment (i.e. without
-c
parameter)?
If the organization already exists and you performed helm upgrade with given
lbPort
value, chances are the internal state of the organization is keeping wrong values. You may try to delete organization (kubectl delete org XYZ) and recreate it again.
But anyway I have a few minutes finally to test fixed script
d
I didn't rerun with -c, I will try that. Thanks.
r
I performed some tests
It doesn't work, really
this is a limitation of k3d k3d loadbalancer (docker container k3d-default-serverlb running nginx) has mapped port, e.g.
0.0.0.0:8443->443/tcp
(asuming you have LBSSLPORT="8443"). But this nginx it has no idea about the mapping itself, so it doesn't set proper X-Forwarded-Port header
therefore applications within the cluster don't know about the external port, because they rely on this header (and there's 443 because k3d-default-serverlb actually listens on 443).
maybe I can fix it.
d
meanwhile I can't tell if I misconfigured something or if docker decided it wasn't going to work today 😂 it's getting stuck on starting the first agent node
r
nah, it looks more like you accicentally pressed CTRL+\
d
it was stuck for minutes, I quit it manually
r
anyway, I managed to convince k3d and ingress controller to work together with non-default tls port, but there are still some configuration issues.
d
ctrl+\ instead of ctrl+c was an accident but I figured the stack traces might be useful
r
Yeah, good old SIGQUIT, I think it works with java as well.
d
had the sysadmin restart docker, it's working again now
a bit worried what could've possibly happened, considering the previous weird behaviors
r
who knows 🤷 The great benefit of docker is you can always throw everything away and start from scratch, without risking your main system corruption.
d
yep, well this time it needed a restart of the whole docker service because I threw the containers away and they didn't recreate and start anymore 😂
hmm, would it be possible to skip the load balancer and directly expose a port to the backend? we ended up coming with a possibly somewhat insane system, which allows us to run GD CN fully locally - hidden behind our own proxy that authenticates with GD, so we have completely seamless integration of GD UI widgets - so all we would need is
localhost:port
where we can call GD APIs
r
The k3d LB is a plain L4 TCP loadbalancer, so there's currently no possibility to manage HTTP headers. It means that ingress-nginx doesn't recieve information about the upstream port of the incoming request: [client] -> (:8443)[k3d TCP LB] -> (:8443)[Ingress svc] -> (:443)[Ingress pod) -> (:appport)[backend svc] The only option how to resolve it would me to make Ingress ctl (both svc and container) to listen on the same port as the k3d LB. It is possible to do - we need to generate ingress-nginx.yaml from some template and fill in the helm values according to setup specified in k3d.sh:
Copy code
apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChart
...
  valuesContent: |-
    controller:
      containerPort:
        https: $LBSSLPORT
      service:
        ports:
          https: $LBSSLPORT
    ...
Now these two values are set to default 443 that works perfectly with default k3d LB port 443.
d
I assume that the ingress-nginx.yaml in your scripts is not sufficient then; do you have a recommended template that should work?
r
Not yet, I need to rework my script so the ingress-nginx.yaml will be generated.
a
Hi @Daniel Chýlek, could you please let us know what’s driving your need to run the service on a port different to the default 443?
d
Hi, our goal is to have seamless integration of GD UI widgets in our existing website. The plan is for GD UI to communicate with a reverse proxy, which directly communicates with a GD CN instance on the backend. We find that a reverse proxy is the easiest way for us to integrate GD, since the reverse proxy can use our own customer database to choose the correct GD instance (internal port), workspace, and automatically authenticate the user through an API token without exposing any of it on the frontend. I know it's not in the spirit of kubernetes, but ultimately we want to run GD on the same server as our other services (at least until we start running into major performance issues), with a port that is not publicly exposed and does not conflict with our existing services; so any standard http/https ports are not available.
r
OK, now I understand your use-case.
The purpose of k3d setup is for development or evaluation. Running such a complex system on a single docker host in production is strongly discouraged. As far as the custom port is concerned, I have found bug in ingress-nginx that prevents the custom port setup from working. The problem is that they have a hard-coded value
443
in one Lua script and they pass this value to upstream servers in X-Forwarded-Port header, instead of the real port where ingress-nginx controller operates.
d
I understand, but we have a small amount of high-powered servers rather than a highly scalable environment, so we keep all services relevant to a customer (website, database, etc.) on the same machine. Our testing server where we're trying to get GD running now is a single machine, which is going to be a worst case scenario, but we can use it to judge the performance and decide if we actually need to do something different to deploy GD to production.
r
Dan, I have finally updated my k3d deployment script in https://github.com/mouchar/gooddata-cn-tools There are several changes that you might be interested in: • support of k3d 5.x • ability to completely disable built-in TLS support (
-n
), making the cluster to expose only HTTP port, no cert-manager stuff etc.) • It should work behind reverse proxy on non-default port, but the proxy MUST pass the
X-Forwarded-*
headers.
👏 2
d
Hi, the reverse proxy works 🎉 but it's having issues with the organization hostname. For testing, we setup the reverse proxy behind a subdomain that's only accessible locally. I created an organization that has the hostname equal to the subdomain and ran
kubectl apply -f org-quant-test.yml
, but GD cannot find it:
I realized I forgot to call the /organizations/ API endpoint. I assumed the error was talking about the kubernetes organization, but maybe that's not the case.
unfortunately the /api/entities/admin/organizations/quant-gd-test endpoint is throwing some default tomcat 404 error
r
organizations endpoint really refers to the same Organization as it is in k8s custom resource. Please check if the organization resource exists and what is its status (
kubectl describe org  quant-gd-test
) The config looks valid. It's possible the record for the organization was not propagated to the database. Also check logs of organization-controller pod.
d
sorry I haven't been active in the past few months, I've been really busy with things and we still don't have several dependencies (like postgres) running in production; I'll try to get the new docker setup working in a few weeks