在 GKE 集群上使用 Terraform 部署 Helm 工作负载
Posted
技术标签:
【中文标题】在 GKE 集群上使用 Terraform 部署 Helm 工作负载【英文标题】:Deploying Helm workloads with Terraform on GKE cluster 【发布时间】:2019-09-04 16:44:00 【问题描述】:我正在尝试使用 Terraform Helm 提供程序 (https://www.terraform.io/docs/providers/helm/index.html) 将工作负载部署到 GKE 集群。
我或多或少遵循 Google 的示例 - https://github.com/GoogleCloudPlatform/terraform-google-examples/blob/master/example-gke-k8s-helm/helm.tf,但我确实想通过手动创建服务帐户来使用 RBAC。
我的 helm.tf 看起来像这样:
variable "helm_version"
default = "v2.13.1"
data "google_client_config" "current"
provider "helm"
tiller_image = "gcr.io/kubernetes-helm/tiller:$var.helm_version"
install_tiller = false # Temporary
kubernetes
host = "$google_container_cluster.data-dome-cluster.endpoint"
token = "$data.google_client_config.current.access_token"
client_certificate = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.client_certificate)"
client_key = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.client_key)"
cluster_ca_certificate = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.cluster_ca_certificate)"
resource "helm_release" "nginx-ingress"
name = "ingress"
chart = "stable/nginx-ingress"
values = [<<EOF
rbac:
create: false
controller:
stats:
enabled: true
metrics:
enabled: true
service:
annotations:
cloud.google.com/load-balancer-type: "Internal"
externalTrafficPolicy: "Local"
EOF
]
depends_on = [
"google_container_cluster.data-dome-cluster",
]
我收到以下错误:
Error: Error applying plan:
1 error(s) occurred:
* module.data-dome-cluster.helm_release.nginx-ingress: 1 error(s) occurred:
* helm_release.nginx-ingress: error creating tunnel: "pods is forbidden: User \"client\" cannot list pods in the namespace \"kube-system\""
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
这发生在我手动创建 Helm RBAC 并安装 Tiller 之后。
我之前也尝试设置“install_tiller=true”,但安装 Tiller 时出现完全相同的错误
“kubectl get pods”没有任何问题。
这个用户“客户端”是什么,为什么禁止访问集群?
谢谢
【问题讨论】:
当您在集群(Tiller)上安装 Helm 时,您是否在运行helm init
时指定了 --service-account
标志?如果要通过 terraform 安装 Tiller,还需要添加 service_account
属性。
我确实指定了--service-account
您能否描述服务帐户,即kubectl describe clusterrole <tiller-service-account>
并将其添加到您的帖子中?
【参考方案1】:
为服务帐户和集群角色绑定创建资源明确适用于我:
resource "kubernetes_service_account" "helm_account"
depends_on = [
"google_container_cluster.data-dome-cluster",
]
metadata
name = "$var.helm_account_name"
namespace = "kube-system"
resource "kubernetes_cluster_role_binding" "helm_role_binding"
metadata
name = "$kubernetes_service_account.helm_account.metadata.0.name"
role_ref
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "cluster-admin"
subject
api_group = ""
kind = "ServiceAccount"
name = "$kubernetes_service_account.helm_account.metadata.0.name"
namespace = "kube-system"
provisioner "local-exec"
command = "sleep 15"
provider "helm"
service_account = "$kubernetes_service_account.helm_account.metadata.0.name"
tiller_image = "gcr.io/kubernetes-helm/tiller:$var.helm_version"
#install_tiller = false # Temporary
kubernetes
host = "$google_container_cluster.data-dome-cluster.endpoint"
token = "$data.google_client_config.current.access_token"
client_certificate = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.client_certificate)"
client_key = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.client_key)"
cluster_ca_certificate = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.cluster_ca_certificate)"
【讨论】:
以上是关于在 GKE 集群上使用 Terraform 部署 Helm 工作负载的主要内容,如果未能解决你的问题,请参考以下文章
将现有 GKE 集群添加到 terraform stat 文件
在具有私有 GKE 集群的 Terraform 上使用 Kubernetes 提供程序
使用 Terraform 部署时 GKE Autopilot 无法调度