如何将 GitLab Operator 部署到 AWS EKS 集群?

Posted

技术标签:

【中文标题】如何将 GitLab Operator 部署到 AWS EKS 集群?【英文标题】:How do you deploy GitLab Operator to a AWS EKS cluster? 【发布时间】:2021-12-14 02:25:06 【问题描述】:

我的目标是在EKS 上部署一个自托管的 GitLab 实例。我已经阅读了 GitLab 文档上的 guide 并正在尝试操作员安装方法。我使用eksctl v0.61.0 和三个 t4g.large 实例设置我的集群。集群出现并且看起来很健康。

kubectl get all --all-namespaces
NAMESPACE     NAME                           READY   STATUS    RESTARTS   AGE
kube-system   pod/aws-node-9k7mg             1/1     Running   0          3m25s
kube-system   pod/aws-node-hlkxr             1/1     Running   0          3m25s
kube-system   pod/aws-node-rc5br             1/1     Running   0          3m24s
kube-system   pod/coredns-5c778788f4-cw5gq   1/1     Running   0          15m
kube-system   pod/coredns-5c778788f4-ff8mn   1/1     Running   0          15m
kube-system   pod/kube-proxy-hrxtz           1/1     Running   0          3m25s
kube-system   pod/kube-proxy-phw7p           1/1     Running   0          3m25s
kube-system   pod/kube-proxy-rtlgj           1/1     Running   0          3m25s

NAMESPACE     NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE
default       service/kubernetes   ClusterIP   10.100.0.1    <none>        443/TCP         16m
kube-system   service/kube-dns     ClusterIP   10.100.0.10   <none>        53/UDP,53/TCP   16m

NAMESPACE     NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-system   daemonset.apps/aws-node     3         3         3       3            3           <none>          16m
kube-system   daemonset.apps/kube-proxy   3         3         3       3            3           <none>          16m

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns   2/2     2            2           16m

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-5c778788f4   2         2         2       15m

我首先使用默认配置安装cert-manager v1.6.0。

kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.0/cert-manager.yaml
kubectl get all -n cert-manager
NAME                                           READY   STATUS    RESTARTS   AGE
pod/cert-manager-77fd97f598-wxtj8              1/1     Running   0          18s
pod/cert-manager-cainjector-7974c84449-ghlfr   1/1     Running   0          18s
pod/cert-manager-webhook-5f4b965fbd-8kqv2      1/1     Running   0          17s

NAME                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/cert-manager           ClusterIP   10.100.71.170    <none>        9402/TCP   18s
service/cert-manager-webhook   ClusterIP   10.100.191.224   <none>        443/TCP    18s

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cert-manager              1/1     1            1           19s
deployment.apps/cert-manager-cainjector   1/1     1            1           19s
deployment.apps/cert-manager-webhook      1/1     1            1           18s

NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/cert-manager-77fd97f598              1         1         1       19s
replicaset.apps/cert-manager-cainjector-7974c84449   1         1         1       19s
replicaset.apps/cert-manager-webhook-5f4b965fbd      1         1         1       18s

接下来,我安装指标服务器

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

这也出现了,似乎没有任何问题。

最后,我尝试安装 GitLab 操作符

GL_OPERATOR_VERSION=0.1.0
PLATFORM=kubernetes
kubectl create namespace gitlab-system
kubectl apply -f https://gitlab.com/api/v4/projects/18899486/packages/generic/gitlab-operator/$GL_OPERATOR_VERSION/gitlab-operator-$PLATFORM-$GL_OPERATOR_VERSION.yaml

*注意:在本文发布时,最新版本的 cert-manager 是 1.6.0。在此更新期间,APIVersions v1alpha2, v1alpha3, and v1beta1 已弃用。当我尝试此安装时,它无法创建颁发者和证书。将 APIVersions 更新为 cert-manager.io/v1 修复了此问题。

现在,它会创建所有资源。

kubectl get all -n gitlab-system
NAME                                            READY   STATUS             RESTARTS   AGE
pod/gitlab-controller-manager-ccd797cb6-9c428   0/2     CrashLoopBackOff   4          30s

NAME                                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/gitlab-controller-manager-metrics-service   ClusterIP   10.100.252.76   <none>        8443/TCP   30s
service/gitlab-webhook-service                      ClusterIP   10.100.85.217   <none>        443/TCP    30s

NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/gitlab-controller-manager   0/1     1            0           30s

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/gitlab-controller-manager-ccd797cb6   1         1         0       30s

如图所示,pod/gitlab-controller-manager-ccd797cb6-9c428 处于CrashLoopBackOff 状态。它会无限期地继续重启。

kubectl describe pod gitlab-controller-manager-ccd797cb6-9c428 -n gitlab-system
Name:         gitlab-controller-manager-ccd797cb6-9c428
Namespace:    gitlab-system
Priority:     0
Node:         ip-192-168-78-2.us-east-2.compute.internal/192.168.78.2
Start Time:   Thu, 28 Oct 2021 18:13:28 -0400
Labels:       control-plane=controller-manager
              pod-template-hash=ccd797cb6
Annotations:  kubernetes.io/psp: eks.privileged
Status:       Running
IP:           192.168.95.73
IPs:
  IP:           192.168.95.73
Controlled By:  ReplicaSet/gitlab-controller-manager-ccd797cb6
Containers:
  manager:
    Container ID:  docker://8576f635b72389a824284a1c342c390036af50bf85a60aa3299af17d77764971
    Image:         registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0
    Image ID:      docker-pullable://registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator@sha256:3d0ff0fc176511d67f3784060023157fbdaed8109539f3d340d68ac8f18d6425
    Ports:         9443/TCP, 6060/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /manager
    Args:
      --metrics-addr=127.0.0.1:8080
      --enable-leader-election
      --zap-devel=true
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 28 Oct 2021 18:14:24 -0400
      Finished:     Thu, 28 Oct 2021 18:14:24 -0400
    Ready:          False
    Restart Count:  3
    Limits:
      cpu:     200m
      memory:  300Mi
    Requests:
      cpu:      200m
      memory:   100Mi
    Liveness:   http-get http://:health-port/liveness delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:  http-get http://:health-port/readiness delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      WATCH_NAMESPACE:  gitlab-system (v1:metadata.namespace)
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from gitlab-manager-token-vjdfx (ro)
  kube-rbac-proxy:
    Container ID:  docker://1db8028b18e0e7f255f1fdc1c0ab086d0cb01d17a10e3b0d17b9a8e6afda9175
    Image:         gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
    Image ID:      docker-pullable://gcr.io/kubebuilder/kube-rbac-proxy@sha256:e10d1d982dd653db74ca87a1d1ad017bc5ef1aeb651bdea089debf16485b080b
    Port:          8443/TCP
    Host Port:     0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=10
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 28 Oct 2021 18:14:24 -0400
      Finished:     Thu, 28 Oct 2021 18:14:24 -0400
    Ready:          False
    Restart Count:  3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from gitlab-manager-token-vjdfx (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  webhook-server-cert
    Optional:    false
  gitlab-manager-token-vjdfx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  gitlab-manager-token-vjdfx
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    61s                default-scheduler  Successfully assigned gitlab-system/gitlab-controller-manager-ccd797cb6-9c428 to ip-192-168-78-2.us-east-2.compute.internal
  Warning  FailedMount  60s (x2 over 61s)  kubelet            MountVolume.SetUp failed for volume "cert" : secret "webhook-server-cert" not found
  Normal   Pulling      55s                kubelet            Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0"
  Normal   Pulled       55s                kubelet            Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 3.560963186s
  Normal   Pulled       53s                kubelet            Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0" in 1.650875485s
  Normal   Pulled       52s                kubelet            Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0" already present on machine
  Normal   Created      52s (x2 over 53s)  kubelet            Created container kube-rbac-proxy
  Normal   Started      52s (x2 over 53s)  kubelet            Started container kube-rbac-proxy
  Normal   Pulled       52s                kubelet            Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 490.074654ms
  Warning  BackOff      50s (x2 over 51s)  kubelet            Back-off restarting failed container
  Warning  BackOff      50s (x2 over 51s)  kubelet            Back-off restarting failed container
  Normal   Pulling      39s (x3 over 59s)  kubelet            Pulling image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0"
  Normal   Started      38s (x3 over 55s)  kubelet            Started container manager
  Normal   Created      38s (x3 over 55s)  kubelet            Created container manager
  Normal   Pulled       38s                kubelet            Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 512.734325ms

我认识到的唯一问题是缺少“webhook-server-cert”密钥。

kubectl get secrets -n gitlab-system
NAME                               TYPE                                  DATA   AGE
default-token-tzxs2                kubernetes.io/service-account-token   3      86s
gitlab-app-token-7btgp             kubernetes.io/service-account-token   3      83s
gitlab-manager-token-vjdfx         kubernetes.io/service-account-token   3      83s
gitlab-nginx-ingress-token-v5jdh   kubernetes.io/service-account-token   3      82s
webhook-server-cert                kubernetes.io/tls                     3      80s

秘密就在那里,当我在上面运行get 时,我可以看到证书和密钥。

这是运行kubectl get events -n gitlab-system的结果

LAST SEEN   TYPE      REASON              OBJECT                                           MESSAGE
100s        Normal    Scheduled           pod/gitlab-controller-manager-ccd797cb6-9c428    Successfully assigned gitlab-system/gitlab-controller-manager-ccd797cb6-9c428 to ip-192-168-78-2.us-east-2.compute.internal
99s         Warning   FailedMount         pod/gitlab-controller-manager-ccd797cb6-9c428    MountVolume.SetUp failed for volume "cert" : secret "webhook-server-cert" not found
78s         Normal    Pulling             pod/gitlab-controller-manager-ccd797cb6-9c428    Pulling image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0"
94s         Normal    Pulled              pod/gitlab-controller-manager-ccd797cb6-9c428    Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 3.560963186s
77s         Normal    Created             pod/gitlab-controller-manager-ccd797cb6-9c428    Created container manager
77s         Normal    Started             pod/gitlab-controller-manager-ccd797cb6-9c428    Started container manager
94s         Normal    Pulling             pod/gitlab-controller-manager-ccd797cb6-9c428    Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0"
92s         Normal    Pulled              pod/gitlab-controller-manager-ccd797cb6-9c428    Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0" in 1.650875485s
91s         Normal    Created             pod/gitlab-controller-manager-ccd797cb6-9c428    Created container kube-rbac-proxy
91s         Normal    Started             pod/gitlab-controller-manager-ccd797cb6-9c428    Started container kube-rbac-proxy
91s         Normal    Pulled              pod/gitlab-controller-manager-ccd797cb6-9c428    Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 490.074654ms
91s         Normal    Pulled              pod/gitlab-controller-manager-ccd797cb6-9c428    Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0" already present on machine
89s         Warning   BackOff             pod/gitlab-controller-manager-ccd797cb6-9c428    Back-off restarting failed container
89s         Warning   BackOff             pod/gitlab-controller-manager-ccd797cb6-9c428    Back-off restarting failed container
77s         Normal    Pulled              pod/gitlab-controller-manager-ccd797cb6-9c428    Successfully pulled image "registry.gitlab.com/gitlab-org/cloud-native/gitlab-operator:0.1.0" in 512.734325ms
100s        Normal    SuccessfulCreate    replicaset/gitlab-controller-manager-ccd797cb6   Created pod: gitlab-controller-manager-ccd797cb6-9c428
100s        Normal    ScalingReplicaSet   deployment/gitlab-controller-manager             Scaled up replica set gitlab-controller-manager-ccd797cb6 to 1
99s         Normal    cert-manager.io     certificaterequest/gitlab-serving-cert-ghlz8     Certificate request has been approved by cert-manager.io
99s         Warning   BadConfig           certificaterequest/gitlab-serving-cert-ghlz8     Certificate will be issued with an empty Issuer DN, which contravenes RFC 5280 and could break some strict clients
99s         Normal    CertificateIssued   certificaterequest/gitlab-serving-cert-ghlz8     Certificate fetched from issuer successfully
99s         Normal    Issuing             certificate/gitlab-serving-cert                  Issuing certificate as Secret does not exist
99s         Normal    Generated           certificate/gitlab-serving-cert                  Stored new private key in temporary Secret resource "gitlab-serving-cert-k5djd"
99s         Normal    Requested           certificate/gitlab-serving-cert                  Created new CertificateRequest resource "gitlab-serving-cert-ghlz8"
99s         Normal    Issuing             certificate/gitlab-serving-cert                  The certificate has been successfully issued

我不确定如何解决这个问题。有什么见解吗?

【问题讨论】:

【参考方案1】:

在调查了一些之后,我发现在容器上运行日志产生了standard_init_linux.go:228: exec user process caused: exec format error 我打开了 GitLab Operator 项目的问题,他们建议 GitLab Operator 必须在 x64_86 架构上运行。 T4g 系列是 AArch64/arm64。我切换到 t2.xlarge 并能够调出操作员。

【讨论】:

以上是关于如何将 GitLab Operator 部署到 AWS EKS 集群?的主要内容,如果未能解决你的问题,请参考以下文章

在自己搭建的gitlab中,能部署用户可访问的前端打包文件吗?

如何直接从我的 Gitlab 存储库部署到 Heroku

如何将 gitlab 存储库自动部署到 Google Cloud Platform?

如何利用Gitlab-ci持续部署到远程机器?

如何使用 AWS CodeDeploy/CodePipeline/S3 将 Gitlab-Ci 部署到 EC2

使用 Gitlab CI 将每个构建部署到服务器