卡在 Terraform 到 Kubernetes 的部分 helm 版本中
Posted
技术标签:
【中文标题】卡在 Terraform 到 Kubernetes 的部分 helm 版本中【英文标题】:Stuck in the partial helm release on Terraform to Kubernetes 【发布时间】:2022-01-13 18:42:39 【问题描述】:我正在尝试将 terraform 资源 (helm_release) 应用到 k8s,但应用命令中途失败。
我检查了 pod 问题,现在我需要更新本地图表中的一些值。
现在我处于两难境地,我无法应用 helm_release,因为名称正在使用中,并且我无法销毁 helm_release,因为它没有被创建。
在我看来唯一的选择是手动删除由 helm_release 图表创建的 k8s 资源?
这是 helm_release 的 terraform:
cat nginx-arm64.tf
resource "helm_release" "nginx-ingress"
name = "nginx-ingress"
chart = "/data/terraform/k8s/nginx-ingress-controller-arm64.tgz"
BTW:我需要使用本地图表,因为官方图表不支持 ARM64 架构。 谢谢,
编辑#1:
Here is the list of helm release -> 没有 gninx 入口
/data/terraform/k8s$ helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
cert-manager default 1 2021-12-08 20:57:38.979176622 +0000 UTC deployed cert-manager-v1.5.0 v1.5.0
/data/terraform/k8s$
这是描述 pod 的输出:
$ k describe pod/nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr
Name: nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr
Namespace: default
Priority: 0
Node: ocifreevmalways/10.0.0.189
Start Time: Wed, 08 Dec 2021 11:11:59 +0000
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=nginx-ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=nginx-ingress-controller
helm.sh/chart=nginx-ingress-controller-9.0.9
pod-template-hash=99cddc76b
Annotations: <none>
Status: Running
IP: 10.244.0.22
IPs:
IP: 10.244.0.22
Controlled By: ReplicaSet/nginx-ingress-nginx-ingress-controller-99cddc76b
Containers:
controller:
Container ID: docker://0b75f5f68ef35dfb7dc5b90f9d1c249fad692855159f4e969324fc4e2ee61654
Image: docker.io/rancher/nginx-ingress-controller:nginx-1.1.0-rancher1
Image ID: docker-pullable://rancher/nginx-ingress-controller@sha256:177fb5dc79adcd16cb6c15d6c42cef31988b116cb148845893b6b954d7d593bc
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--default-backend-service=default/nginx-ingress-nginx-ingress-controller-default-backend
--election-id=ingress-controller-leader
--controller-class=k8s.io/ingress-nginx
--configmap=default/nginx-ingress-nginx-ingress-controller
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 08 Dec 2021 22:02:15 +0000
Finished: Wed, 08 Dec 2021 22:02:15 +0000
Ready: False
Restart Count: 132
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wzqqn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-wzqqn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 8m38s (x132 over 10h) kubelet Container image "docker.io/rancher/nginx-ingress-controller:nginx-1.1.0-rancher1" already present on machine
Warning BackOff 3m39s (x3201 over 10h) kubelet Back-off restarting failed container
terraform 状态列表不显示任何内容:
/data/terraform/k8s$ t state list
/data/terraform/k8s$
虽然 terraform.tfstate.backup 显示了 nginx 入口(我猜我确实在两者之间运行了 destroy 命令?):
/data/terraform/k8s$ cat terraform.tfstate.backup
"version": 4,
"terraform_version": "1.0.11",
"serial": 28,
"lineage": "30e74aa5-9631-f82f-61a2-7bdbd97c2276",
"outputs": ,
"resources": [
"mode": "managed",
"type": "helm_release",
"name": "nginx-ingress",
"provider": "provider[\"registry.terraform.io/hashicorp/helm\"]",
"instances": [
"status": "tainted",
"schema_version": 0,
"attributes":
"atomic": false,
"chart": "/data/terraform/k8s/nginx-ingress-controller-arm64.tgz",
"cleanup_on_fail": false,
"create_namespace": false,
"dependency_update": false,
"description": null,
"devel": null,
"disable_crd_hooks": false,
"disable_openapi_validation": false,
"disable_webhooks": false,
"force_update": false,
"id": "nginx-ingress",
"keyring": null,
"lint": false,
"manifest": null,
"max_history": 0,
"metadata": [
"app_version": "1.1.0",
"chart": "nginx-ingress-controller",
"name": "nginx-ingress",
"namespace": "default",
"revision": 1,
"values": "",
"version": "9.0.9"
],
"name": "nginx-ingress",
"namespace": "default",
"postrender": [],
"recreate_pods": false,
"render_subchart_notes": true,
"replace": false,
"repository": null,
"repository_ca_file": null,
"repository_cert_file": null,
"repository_key_file": null,
"repository_password": null,
"repository_username": null,
"reset_values": false,
"reuse_values": false,
"set": [],
"set_sensitive": [],
"skip_crds": false,
"status": "failed",
"timeout": 300,
"values": null,
"verify": false,
"version": "9.0.9",
"wait": true,
"wait_for_jobs": false
,
"sensitive_attributes": [],
"private": "bnVsbA=="
]
]
当我尝试在同一目录下申请时,再次提示错误:
Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
helm_release.nginx-ingress: Creating...
╷
│ Error: cannot re-use a name that is still in use
│
│ with helm_release.nginx-ingress,
│ on nginx-arm64.tf line 1, in resource "helm_release" "nginx-ingress":
│ 1: resource "helm_release" "nginx-ingress"
请分享您的想法。谢谢。
编辑2:
DEBUG 日志显示了更多线索:
2021-12-09T04:30:14.118Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceDiff: nginx-ingress] Release validated: timestamp=2021-12-09T04:30:14.118Z
2021-12-09T04:30:14.118Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceDiff: nginx-ingress] Done: timestamp=2021-12-09T04:30:14.118Z
2021-12-09T04:30:14.119Z [WARN] Provider "registry.terraform.io/hashicorp/helm" produced an invalid plan for helm_release.nginx-ingress, but we are tolerating it because it is using the legacy plugin SDK.
The following problems may be the cause of any confusing errors from downstream operations:
- .cleanup_on_fail: planned value cty.False for a non-computed attribute
- .create_namespace: planned value cty.False for a non-computed attribute
- .verify: planned value cty.False for a non-computed attribute
- .recreate_pods: planned value cty.False for a non-computed attribute
- .render_subchart_notes: planned value cty.True for a non-computed attribute
- .replace: planned value cty.False for a non-computed attribute
- .reset_values: planned value cty.False for a non-computed attribute
- .disable_crd_hooks: planned value cty.False for a non-computed attribute
- .lint: planned value cty.False for a non-computed attribute
- .namespace: planned value cty.StringVal("default") for a non-computed attribute
- .skip_crds: planned value cty.False for a non-computed attribute
- .disable_webhooks: planned value cty.False for a non-computed attribute
- .force_update: planned value cty.False for a non-computed attribute
- .timeout: planned value cty.NumberIntVal(300) for a non-computed attribute
- .reuse_values: planned value cty.False for a non-computed attribute
- .dependency_update: planned value cty.False for a non-computed attribute
- .disable_openapi_validation: planned value cty.False for a non-computed attribute
- .atomic: planned value cty.False for a non-computed attribute
- .wait: planned value cty.True for a non-computed attribute
- .max_history: planned value cty.NumberIntVal(0) for a non-computed attribute
- .wait_for_jobs: planned value cty.False for a non-computed attribute
helm_release.nginx-ingress: Creating...
2021-12-09T04:30:14.119Z [INFO] Starting apply for helm_release.nginx-ingress
2021-12-09T04:30:14.119Z [INFO] Starting apply for helm_release.nginx-ingress
2021-12-09T04:30:14.119Z [DEBUG] helm_release.nginx-ingress: applying the planned Create change
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] setting computed for "metadata" from ComputedKeys: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Started: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Getting helm configuration: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [INFO] GetHelmConfiguration start: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] Using kubeconfig: /home/ubuntu/.kube/config: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [INFO] Successfully initialized kubernetes config: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.121Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [INFO] GetHelmConfiguration success: timestamp=2021-12-09T04:30:14.121Z
2021-12-09T04:30:14.121Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Getting chart: timestamp=2021-12-09T04:30:14.121Z
2021-12-09T04:30:14.125Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Preparing for installation: timestamp=2021-12-09T04:30:14.125Z
2021-12-09T04:30:14.125Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 ---[ values.yaml ]-----------------------------------
: timestamp=2021-12-09T04:30:14.125Z
2021-12-09T04:30:14.125Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Installing chart: timestamp=2021-12-09T04:30:14.125Z
╷
│ Error: cannot re-use a name that is still in use
│
│ with helm_release.nginx-ingress,
│ on nginx-arm64.tf line 1, in resource "helm_release" "nginx-ingress":
│ 1: resource "helm_release" "nginx-ingress"
│
╵
2021-12-09T04:30:14.158Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
2021-12-09T04:30:14.160Z [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/helm/2.4.1/linux_arm64/terraform-provider-helm_v2.4.1_x5 pid=558800
2021-12-09T04:30:14.160Z [DEBUG] provider: plugin exited
【问题讨论】:
【参考方案1】:您不必使用kubectl
手动删除所有资源。在幕后,Terraform Helm 提供者仍然使用 Helm。因此,如果您运行 helm list -A
,您将看到集群上的所有 Helm 版本,包括 nginx-ingress
版本。然后通过helm uninstall nginx-ingress -n REPLACE_WITH_YOUR_NAMESPACE
删除版本。
在重新运行 terraform apply
之前,请通过 terraform state list
检查 Helm 版本是否仍处于 Terraform 状态(从与运行 terraform apply
的目录相同的目录运行它)。如果您在该列表中没有看到 helm_release.nginx-ingress
,则它不在您的 Terraform 状态,您可以重新运行您的 terraform apply
。否则你必须通过terraform state rm helm_release.nginx-ingress
删除它,然后你可以再次运行terraform apply
。
【讨论】:
感谢您的建议,我根据建议编辑帖子并提供更多信息。 我想我解决了这个问题。这就是我所做的:1)将 terraform.tfstate 转换为另一个名称,2)将 terraform.tfstate.backup 转换为 terraform.tfstate,以及 3)运行“terraform refresh”命令以确认状态已同步,以及 4)运行“terraform apply”以删除/创建资源。我会将您的回复标记为答案,因为它为我提供了解决问题的线索。谢谢! 我猜“丢失”状态是由于我取消了应用命令。以上是关于卡在 Terraform 到 Kubernetes 的部分 helm 版本中的主要内容,如果未能解决你的问题,请参考以下文章
将 Terraform 生态粘合到 Kubernetes 世界
通过 Terraform 设置用于自动扩展 kubernetes 集群的启动磁盘大小
在同一个 TF 脚本中使用多个 Terraform 提供程序(GCP 和 Kubernetes)创建资源
如何使用 kubernetes_ingress terraform 资源创建 AWS ALB?