卡在 Terraform 到 Kubernetes 的部分 helm 版本中

Posted 2023-03-24

技术标签:

【中文标题】卡在 Terraform 到 Kubernetes 的部分 helm 版本中【英文标题】：Stuck in the partial helm release on Terraform to Kubernetes 【发布时间】：2022-01-13 18:42:39 【问题描述】：

我正在尝试将 terraform 资源 (helm_release) 应用到 k8s，但应用命令中途失败。

我检查了 pod 问题，现在我需要更新本地图表中的一些值。

现在我处于两难境地，我无法应用 helm_release，因为名称正在使用中，并且我无法销毁 helm_release，因为它没有被创建。

在我看来唯一的选择是手动删除由 helm_release 图表创建的 k8s 资源？

这是 helm_release 的 terraform：

cat nginx-arm64.tf 

resource "helm_release" "nginx-ingress" 
  name  = "nginx-ingress"
  chart = "/data/terraform/k8s/nginx-ingress-controller-arm64.tgz"

BTW：我需要使用本地图表，因为官方图表不支持 ARM64 架构。谢谢，

编辑#1：

Here is the list of helm release -> 没有 gninx 入口

/data/terraform/k8s$ helm list -A
NAME            NAMESPACE   REVISION    UPDATED                                 STATUS      CHART               APP VERSION
cert-manager    default     1           2021-12-08 20:57:38.979176622 +0000 UTC deployed    cert-manager-v1.5.0 v1.5.0     
/data/terraform/k8s$

这是描述 pod 的输出：

$ k describe pod/nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr
Name:         nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr
Namespace:    default
Priority:     0
Node:         ocifreevmalways/10.0.0.189
Start Time:   Wed, 08 Dec 2021 11:11:59 +0000
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=nginx-ingress
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=nginx-ingress-controller
              helm.sh/chart=nginx-ingress-controller-9.0.9
              pod-template-hash=99cddc76b
Annotations:  <none>
Status:       Running
IP:           10.244.0.22
IPs:
  IP:           10.244.0.22
Controlled By:  ReplicaSet/nginx-ingress-nginx-ingress-controller-99cddc76b
Containers:
  controller:
    Container ID:  docker://0b75f5f68ef35dfb7dc5b90f9d1c249fad692855159f4e969324fc4e2ee61654
    Image:         docker.io/rancher/nginx-ingress-controller:nginx-1.1.0-rancher1
    Image ID:      docker-pullable://rancher/nginx-ingress-controller@sha256:177fb5dc79adcd16cb6c15d6c42cef31988b116cb148845893b6b954d7d593bc
    Ports:         80/TCP, 443/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --default-backend-service=default/nginx-ingress-nginx-ingress-controller-default-backend
      --election-id=ingress-controller-leader
      --controller-class=k8s.io/ingress-nginx
      --configmap=default/nginx-ingress-nginx-ingress-controller
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 08 Dec 2021 22:02:15 +0000
      Finished:     Wed, 08 Dec 2021 22:02:15 +0000
    Ready:          False
    Restart Count:  132
    Liveness:       http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr (v1:metadata.name)
      POD_NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wzqqn (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-wzqqn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                     From     Message
  ----     ------   ----                    ----     -------
  Normal   Pulled   8m38s (x132 over 10h)   kubelet  Container image "docker.io/rancher/nginx-ingress-controller:nginx-1.1.0-rancher1" already present on machine
  Warning  BackOff  3m39s (x3201 over 10h)  kubelet  Back-off restarting failed container

terraform 状态列表不显示任何内容：

/data/terraform/k8s$ t state list
/data/terraform/k8s$

虽然 terraform.tfstate.backup 显示了 nginx 入口（我猜我确实在两者之间运行了 destroy 命令？）：

/data/terraform/k8s$ cat terraform.tfstate.backup

  "version": 4,
  "terraform_version": "1.0.11",
  "serial": 28,
  "lineage": "30e74aa5-9631-f82f-61a2-7bdbd97c2276",
  "outputs": ,
  "resources": [
    
      "mode": "managed",
      "type": "helm_release",
      "name": "nginx-ingress",
      "provider": "provider[\"registry.terraform.io/hashicorp/helm\"]",
      "instances": [
        
          "status": "tainted",
          "schema_version": 0,
          "attributes": 
            "atomic": false,
            "chart": "/data/terraform/k8s/nginx-ingress-controller-arm64.tgz",
            "cleanup_on_fail": false,
            "create_namespace": false,
            "dependency_update": false,
            "description": null,
            "devel": null,
            "disable_crd_hooks": false,
            "disable_openapi_validation": false,
            "disable_webhooks": false,
            "force_update": false,
            "id": "nginx-ingress",
            "keyring": null,
            "lint": false,
            "manifest": null,
            "max_history": 0,
            "metadata": [
              
                "app_version": "1.1.0",
                "chart": "nginx-ingress-controller",
                "name": "nginx-ingress",
                "namespace": "default",
                "revision": 1,
                "values": "",
                "version": "9.0.9"
              
            ],
            "name": "nginx-ingress",
            "namespace": "default",
            "postrender": [],
            "recreate_pods": false,
            "render_subchart_notes": true,
            "replace": false,
            "repository": null,
            "repository_ca_file": null,
            "repository_cert_file": null,
            "repository_key_file": null,
            "repository_password": null,
            "repository_username": null,
            "reset_values": false,
            "reuse_values": false,
            "set": [],
            "set_sensitive": [],
            "skip_crds": false,
            "status": "failed",
            "timeout": 300,
            "values": null,
            "verify": false,
            "version": "9.0.9",
            "wait": true,
            "wait_for_jobs": false
          ,
          "sensitive_attributes": [],
          "private": "bnVsbA=="
        
      ]
    
  ]

当我尝试在同一目录下申请时，再次提示错误：

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

helm_release.nginx-ingress: Creating...
╷
│ Error: cannot re-use a name that is still in use
│ 
│   with helm_release.nginx-ingress,
│   on nginx-arm64.tf line 1, in resource "helm_release" "nginx-ingress":
│    1: resource "helm_release" "nginx-ingress"

请分享您的想法。谢谢。

编辑2：

DEBUG 日志显示了更多线索：

2021-12-09T04:30:14.118Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceDiff: nginx-ingress] Release validated: timestamp=2021-12-09T04:30:14.118Z
2021-12-09T04:30:14.118Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceDiff: nginx-ingress] Done: timestamp=2021-12-09T04:30:14.118Z
2021-12-09T04:30:14.119Z [WARN]  Provider "registry.terraform.io/hashicorp/helm" produced an invalid plan for helm_release.nginx-ingress, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .cleanup_on_fail: planned value cty.False for a non-computed attribute
      - .create_namespace: planned value cty.False for a non-computed attribute
      - .verify: planned value cty.False for a non-computed attribute
      - .recreate_pods: planned value cty.False for a non-computed attribute
      - .render_subchart_notes: planned value cty.True for a non-computed attribute
      - .replace: planned value cty.False for a non-computed attribute
      - .reset_values: planned value cty.False for a non-computed attribute
      - .disable_crd_hooks: planned value cty.False for a non-computed attribute
      - .lint: planned value cty.False for a non-computed attribute
      - .namespace: planned value cty.StringVal("default") for a non-computed attribute
      - .skip_crds: planned value cty.False for a non-computed attribute
      - .disable_webhooks: planned value cty.False for a non-computed attribute
      - .force_update: planned value cty.False for a non-computed attribute
      - .timeout: planned value cty.NumberIntVal(300) for a non-computed attribute
      - .reuse_values: planned value cty.False for a non-computed attribute
      - .dependency_update: planned value cty.False for a non-computed attribute
      - .disable_openapi_validation: planned value cty.False for a non-computed attribute
      - .atomic: planned value cty.False for a non-computed attribute
      - .wait: planned value cty.True for a non-computed attribute
      - .max_history: planned value cty.NumberIntVal(0) for a non-computed attribute
      - .wait_for_jobs: planned value cty.False for a non-computed attribute
helm_release.nginx-ingress: Creating...
2021-12-09T04:30:14.119Z [INFO]  Starting apply for helm_release.nginx-ingress
2021-12-09T04:30:14.119Z [INFO]  Starting apply for helm_release.nginx-ingress
2021-12-09T04:30:14.119Z [DEBUG] helm_release.nginx-ingress: applying the planned Create change
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] setting computed for "metadata" from ComputedKeys: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Started: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Getting helm configuration: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [INFO] GetHelmConfiguration start: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] Using kubeconfig: /home/ubuntu/.kube/config: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [INFO] Successfully initialized kubernetes config: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.121Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [INFO] GetHelmConfiguration success: timestamp=2021-12-09T04:30:14.121Z
2021-12-09T04:30:14.121Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Getting chart: timestamp=2021-12-09T04:30:14.121Z
2021-12-09T04:30:14.125Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Preparing for installation: timestamp=2021-12-09T04:30:14.125Z
2021-12-09T04:30:14.125Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 ---[ values.yaml ]-----------------------------------
: timestamp=2021-12-09T04:30:14.125Z
2021-12-09T04:30:14.125Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Installing chart: timestamp=2021-12-09T04:30:14.125Z
╷
│ Error: cannot re-use a name that is still in use
│ 
│   with helm_release.nginx-ingress,
│   on nginx-arm64.tf line 1, in resource "helm_release" "nginx-ingress":
│    1: resource "helm_release" "nginx-ingress" 
│ 
╵
2021-12-09T04:30:14.158Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
2021-12-09T04:30:14.160Z [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/helm/2.4.1/linux_arm64/terraform-provider-helm_v2.4.1_x5 pid=558800
2021-12-09T04:30:14.160Z [DEBUG] provider: plugin exited

【问题讨论】：

【参考方案1】：

您不必使用kubectl 手动删除所有资源。在幕后，Terraform Helm 提供者仍然使用 Helm。因此，如果您运行 helm list -A，您将看到集群上的所有 Helm 版本，包括 nginx-ingress 版本。然后通过helm uninstall nginx-ingress -n REPLACE_WITH_YOUR_NAMESPACE 删除版本。

在重新运行 terraform apply 之前，请通过 terraform state list 检查 Helm 版本是否仍处于 Terraform 状态（从与运行 terraform apply 的目录相同的目录运行它）。如果您在该列表中没有看到 helm_release.nginx-ingress，则它不在您的 Terraform 状态，您可以重新运行您的 terraform apply。否则你必须通过terraform state rm helm_release.nginx-ingress删除它，然后你可以再次运行terraform apply。

【讨论】：

感谢您的建议，我根据建议编辑帖子并提供更多信息。我想我解决了这个问题。这就是我所做的：1）将 terraform.tfstate 转换为另一个名称，2）将 terraform.tfstate.backup 转换为 terraform.tfstate，以及 3）运行“terraform refresh”命令以确认状态已同步，以及 4）运行“terraform apply”以删除/创建资源。我会将您的回复标记为答案，因为它为我提供了解决问题的线索。谢谢！我猜“丢失”状态是由于我取消了应用命令。

以上是关于卡在 Terraform 到 Kubernetes 的部分 helm 版本中的主要内容，如果未能解决你的问题，请参考以下文章