如何将 GitLab Operator 部署到 AWS EKS 集群?
Posted
技术标签:
【中文标题】如何将 GitLab Operator 部署到 AWS EKS 集群?【英文标题】:How do you deploy GitLab Operator to a AWS EKS cluster? 【发布时间】:2021-12-14 02:25:06 【问题描述】:我的目标是在EKS 上部署一个自托管的 GitLab 实例。我已经阅读了 GitLab 文档上的 guide 并正在尝试操作员安装方法。我使用eksctl v0.61.0 和三个 t4g.large 实例设置我的集群。集群出现并且看起来很健康。
kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/aws-node-9k7mg 1/1 Running 0 3m25s
kube-system pod/aws-node-hlkxr 1/1 Running 0 3m25s
kube-system pod/aws-node-rc5br 1/1 Running 0 3m24s
kube-system pod/coredns-5c778788f4-cw5gq 1/1 Running 0 15m
kube-system pod/coredns-5c778788f4-ff8mn 1/1 Running 0 15m
kube-system pod/kube-proxy-hrxtz 1/1 Running 0 3m25s
kube-system pod/kube-proxy-phw7p 1/1 Running 0 3m25s
kube-system pod/kube-proxy-rtlgj 1/1 Running 0 3m25s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 16m
kube-system service/kube-dns ClusterIP 10.100.0.10 <none> 53/UDP,53/TCP 16m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/aws-node 3 3 3 3 3 <none> 16m
kube-system daemonset.apps/kube-proxy 3 3 3 3 3 <none> 16m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2/2 2 2 16m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-5c778788f4 2 2 2 15m
我首先使用默认配置安装cert-manager v1.6.0。
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.0/cert-manager.yaml
kubectl get all -n cert-manager
NAME READY STATUS RESTARTS AGE
pod/cert-manager-77fd97f598-wxtj8 1/1 Running 0 18s
pod/cert-manager-cainjector-7974c84449-ghlfr 1/1 Running 0 18s
pod/cert-manager-webhook-5f4b965fbd-8kqv2 1/1 Running 0 17s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cert-manager ClusterIP 10.100.71.170 <none> 9402/TCP 18s
service/cert-manager-webhook ClusterIP 10.100.191.224 <none> 443/TCP 18s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cert-manager 1/1 1 1 19s
deployment.apps/cert-manager-cainjector 1/1 1 1 19s
deployment.apps/cert-manager-webhook 1/1 1 1 18s
NAME DESIRED CURRENT READY AGE
replicaset.apps/cert-manager-77fd97f598 1 1 1 19s
replicaset.apps/cert-manager-cainjector-7974c84449 1 1 1 19s
replicaset.apps/cert-manager-webhook-5f4b965fbd 1 1 1 18s
接下来,我安装指标服务器
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
这也出现了,似乎没有任何问题。
最后,我尝试安装 GitLab 操作符
GL_OPERATOR_VERSION=0.1.0
PLATFORM=kubernetes
kubectl create namespace gitlab-system
kubectl apply -f https://gitlab.com/api/v4/projects/18899486/packages/generic/gitlab-operator/$GL_OPERATOR_VERSION/gitlab-operator-$PLATFORM-$GL_OPERATOR_VERSION.yaml
*注意:在本文发布时,最新版本的 cert-manager 是 1.6.0。在此更新期间,APIVersions v1alpha2, v1alpha3, and v1beta1
已弃用。当我尝试此安装时,它无法创建颁发者和证书。将 APIVersions 更新为 cert-manager.io/v1
修复了此问题。
现在,它会创建所有资源。
kubectl get all -n gitlab-system
NAME READY STATUS RESTARTS AGE
pod/gitlab-controller-manager-ccd797cb6-9c428 0/2 CrashLoopBackOff 4 30s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/gitlab-controller-manager-metrics-service ClusterIP 10.100.252.76 <none> 8443/TCP 30s
service/gitlab-webhook-service ClusterIP 10.100.85.217 <none> 443/TCP 30s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/gitlab-controller-manager 0/1 1 0 30s
NAME DESIRED CURRENT READY AGE
replicaset.apps/gitlab-controller-manager-ccd797cb6 1 1 0 30s
如图所示,pod/gitlab-controller-manager-ccd797cb6-9c428
处于CrashLoopBackOff
状态。它会无限期地继续重启。
kubectl describe pod gitlab-controller-manager-ccd797cb6-9c428 -n gitlab-system
Name: gitlab-controller-manager-ccd797cb6-9c428
Namespace: gitlab-system
Priority: 0
Node: ip-192-168-78-2.us-east-2.compute.internal/192.168.78.2
Start Time: Thu, 28 Oct 2021 18:13:28 -0400
Labels: control-plane=controller-manager
pod-template-hash=ccd797cb6
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.95.73
IPs:
IP: 192.168.95.73
Controlled By: ReplicaSet/gitlab-controller-manager-ccd797cb6
Containers:
manager:
Container ID: docker://8576f635b72389a824284a1c342c390036af50bf85a60aa3299af17d77764971
Image: registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0
Image ID: docker-pullable://registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator@sha256:3d0ff0fc176511d67f3784060023157fbdaed8109539f3d340d68ac8f18d6425
Ports: 9443/TCP, 6060/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/manager
Args:
--metrics-addr=127.0.0.1:8080
--enable-leader-election
--zap-devel=true
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 28 Oct 2021 18:14:24 -0400
Finished: Thu, 28 Oct 2021 18:14:24 -0400
Ready: False
Restart Count: 3
Limits:
cpu: 200m
memory: 300Mi
Requests:
cpu: 200m
memory: 100Mi
Liveness: http-get http://:health-port/liveness delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: http-get http://:health-port/readiness delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
WATCH_NAMESPACE: gitlab-system (v1:metadata.namespace)
Mounts:
/tmp/k8s-webhook-server/serving-certs from cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from gitlab-manager-token-vjdfx (ro)
kube-rbac-proxy:
Container ID: docker://1db8028b18e0e7f255f1fdc1c0ab086d0cb01d17a10e3b0d17b9a8e6afda9175
Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
Image ID: docker-pullable://gcr.io/kubebuilder/kube-rbac-proxy@sha256:e10d1d982dd653db74ca87a1d1ad017bc5ef1aeb651bdea089debf16485b080b
Port: 8443/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=0.0.0.0:8443
--upstream=http://127.0.0.1:8080/
--logtostderr=true
--v=10
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 28 Oct 2021 18:14:24 -0400
Finished: Thu, 28 Oct 2021 18:14:24 -0400
Ready: False
Restart Count: 3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from gitlab-manager-token-vjdfx (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cert:
Type: Secret (a volume populated by a Secret)
SecretName: webhook-server-cert
Optional: false
gitlab-manager-token-vjdfx:
Type: Secret (a volume populated by a Secret)
SecretName: gitlab-manager-token-vjdfx
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 61s default-scheduler Successfully assigned gitlab-system/gitlab-controller-manager-ccd797cb6-9c428 to ip-192-168-78-2.us-east-2.compute.internal
Warning FailedMount 60s (x2 over 61s) kubelet MountVolume.SetUp failed for volume "cert" : secret "webhook-server-cert" not found
Normal Pulling 55s kubelet Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0"
Normal Pulled 55s kubelet Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 3.560963186s
Normal Pulled 53s kubelet Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0" in 1.650875485s
Normal Pulled 52s kubelet Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0" already present on machine
Normal Created 52s (x2 over 53s) kubelet Created container kube-rbac-proxy
Normal Started 52s (x2 over 53s) kubelet Started container kube-rbac-proxy
Normal Pulled 52s kubelet Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 490.074654ms
Warning BackOff 50s (x2 over 51s) kubelet Back-off restarting failed container
Warning BackOff 50s (x2 over 51s) kubelet Back-off restarting failed container
Normal Pulling 39s (x3 over 59s) kubelet Pulling image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0"
Normal Started 38s (x3 over 55s) kubelet Started container manager
Normal Created 38s (x3 over 55s) kubelet Created container manager
Normal Pulled 38s kubelet Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 512.734325ms
我认识到的唯一问题是缺少“webhook-server-cert”密钥。
kubectl get secrets -n gitlab-system
NAME TYPE DATA AGE
default-token-tzxs2 kubernetes.io/service-account-token 3 86s
gitlab-app-token-7btgp kubernetes.io/service-account-token 3 83s
gitlab-manager-token-vjdfx kubernetes.io/service-account-token 3 83s
gitlab-nginx-ingress-token-v5jdh kubernetes.io/service-account-token 3 82s
webhook-server-cert kubernetes.io/tls 3 80s
秘密就在那里,当我在上面运行get
时,我可以看到证书和密钥。
这是运行kubectl get events -n gitlab-system
的结果
LAST SEEN TYPE REASON OBJECT MESSAGE
100s Normal Scheduled pod/gitlab-controller-manager-ccd797cb6-9c428 Successfully assigned gitlab-system/gitlab-controller-manager-ccd797cb6-9c428 to ip-192-168-78-2.us-east-2.compute.internal
99s Warning FailedMount pod/gitlab-controller-manager-ccd797cb6-9c428 MountVolume.SetUp failed for volume "cert" : secret "webhook-server-cert" not found
78s Normal Pulling pod/gitlab-controller-manager-ccd797cb6-9c428 Pulling image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0"
94s Normal Pulled pod/gitlab-controller-manager-ccd797cb6-9c428 Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 3.560963186s
77s Normal Created pod/gitlab-controller-manager-ccd797cb6-9c428 Created container manager
77s Normal Started pod/gitlab-controller-manager-ccd797cb6-9c428 Started container manager
94s Normal Pulling pod/gitlab-controller-manager-ccd797cb6-9c428 Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0"
92s Normal Pulled pod/gitlab-controller-manager-ccd797cb6-9c428 Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0" in 1.650875485s
91s Normal Created pod/gitlab-controller-manager-ccd797cb6-9c428 Created container kube-rbac-proxy
91s Normal Started pod/gitlab-controller-manager-ccd797cb6-9c428 Started container kube-rbac-proxy
91s Normal Pulled pod/gitlab-controller-manager-ccd797cb6-9c428 Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 490.074654ms
91s Normal Pulled pod/gitlab-controller-manager-ccd797cb6-9c428 Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0" already present on machine
89s Warning BackOff pod/gitlab-controller-manager-ccd797cb6-9c428 Back-off restarting failed container
89s Warning BackOff pod/gitlab-controller-manager-ccd797cb6-9c428 Back-off restarting failed container
77s Normal Pulled pod/gitlab-controller-manager-ccd797cb6-9c428 Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 512.734325ms
100s Normal SuccessfulCreate replicaset/gitlab-controller-manager-ccd797cb6 Created pod: gitlab-controller-manager-ccd797cb6-9c428
100s Normal ScalingReplicaSet deployment/gitlab-controller-manager Scaled up replica set gitlab-controller-manager-ccd797cb6 to 1
99s Normal cert-manager.io certificaterequest/gitlab-serving-cert-ghlz8 Certificate request has been approved by cert-manager.io
99s Warning BadConfig certificaterequest/gitlab-serving-cert-ghlz8 Certificate will be issued with an empty Issuer DN, which contravenes RFC 5280 and could break some strict clients
99s Normal CertificateIssued certificaterequest/gitlab-serving-cert-ghlz8 Certificate fetched from issuer successfully
99s Normal Issuing certificate/gitlab-serving-cert Issuing certificate as Secret does not exist
99s Normal Generated certificate/gitlab-serving-cert Stored new private key in temporary Secret resource "gitlab-serving-cert-k5djd"
99s Normal Requested certificate/gitlab-serving-cert Created new CertificateRequest resource "gitlab-serving-cert-ghlz8"
99s Normal Issuing certificate/gitlab-serving-cert The certificate has been successfully issued
我不确定如何解决这个问题。有什么见解吗?
【问题讨论】:
【参考方案1】:在调查了一些之后,我发现在容器上运行日志产生了standard_init_linux.go:228: exec user process caused: exec format error
我打开了 GitLab Operator 项目的问题,他们建议 GitLab Operator 必须在 x64_86 架构上运行。 T4g 系列是 AArch64/arm64。我切换到 t2.xlarge 并能够调出操作员。
【讨论】:
以上是关于如何将 GitLab Operator 部署到 AWS EKS 集群?的主要内容,如果未能解决你的问题,请参考以下文章
在自己搭建的gitlab中,能部署用户可访问的前端打包文件吗?
如何将 gitlab 存储库自动部署到 Google Cloud Platform?