在 GKE 集群上使用 Terraform 部署 Helm 工作负载

Posted

技术标签:

【中文标题】在 GKE 集群上使用 Terraform 部署 Helm 工作负载【英文标题】:Deploying Helm workloads with Terraform on GKE cluster 【发布时间】:2019-09-04 16:44:00 【问题描述】:

我正在尝试使用 Terraform Helm 提供程序 (https://www.terraform.io/docs/providers/helm/index.html) 将工作负载部署到 GKE 集群。

我或多或少遵循 Google 的示例 - https://github.com/GoogleCloudPlatform/terraform-google-examples/blob/master/example-gke-k8s-helm/helm.tf,但我确实想通过手动创建服务帐户来使用 RBAC。

我的 helm.tf 看起来像这样:

variable "helm_version" 
  default = "v2.13.1"


data "google_client_config" "current" 

provider "helm" 
  tiller_image = "gcr.io/kubernetes-helm/tiller:$var.helm_version"
  install_tiller = false # Temporary

  kubernetes 
    host                   = "$google_container_cluster.data-dome-cluster.endpoint"
    token                  = "$data.google_client_config.current.access_token"

    client_certificate     = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.client_certificate)"
    client_key             = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.client_key)"
    cluster_ca_certificate = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.cluster_ca_certificate)"
  



resource "helm_release" "nginx-ingress" 
  name  = "ingress"
  chart = "stable/nginx-ingress"

  values = [<<EOF
rbac:
  create: false
controller:
  stats:
    enabled: true
  metrics:
    enabled: true
  service:
    annotations:
      cloud.google.com/load-balancer-type: "Internal"
    externalTrafficPolicy: "Local"
EOF
  ]

  depends_on = [
    "google_container_cluster.data-dome-cluster",
  ]

我收到以下错误:

Error: Error applying plan:

1 error(s) occurred:

* module.data-dome-cluster.helm_release.nginx-ingress: 1 error(s) occurred:

* helm_release.nginx-ingress: error creating tunnel: "pods is forbidden: User \"client\" cannot list pods in the namespace \"kube-system\""

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

这发生在我手动创建 Helm RBAC 并安装 Tiller 之后。

我之前也尝试设置“install_tiller=true”,但安装 Tiller 时出现完全相同的错误

“kubectl get pods”没有任何问题。

这个用户“客户端”是什么,为什么禁止访问集群?

谢谢

【问题讨论】:

当您在集群(Tiller)上安装 Helm 时,您是否在运行 helm init 时指定了 --service-account 标志?如果要通过 terraform 安装 Tiller,还需要添加 service_account 属性。 我确实指定了--service-account 您能否描述服务帐户,即kubectl describe clusterrole &lt;tiller-service-account&gt; 并将其添加到您的帖子中? 【参考方案1】:

为服务帐户和集群角色绑定创建资源明确适用于我:

resource "kubernetes_service_account" "helm_account" 
  depends_on = [
    "google_container_cluster.data-dome-cluster",
  ]
  metadata 
    name      = "$var.helm_account_name"
    namespace = "kube-system"
  


resource "kubernetes_cluster_role_binding" "helm_role_binding" 
  metadata 
    name = "$kubernetes_service_account.helm_account.metadata.0.name"
  
  role_ref 
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "cluster-admin"
  
  subject 
    api_group = ""
    kind      = "ServiceAccount"
    name      = "$kubernetes_service_account.helm_account.metadata.0.name"
    namespace = "kube-system"
  
  provisioner "local-exec" 
    command = "sleep 15"
  


provider "helm" 
  service_account = "$kubernetes_service_account.helm_account.metadata.0.name"
  tiller_image = "gcr.io/kubernetes-helm/tiller:$var.helm_version"
  #install_tiller = false # Temporary

  kubernetes 
    host                   = "$google_container_cluster.data-dome-cluster.endpoint"
    token                  = "$data.google_client_config.current.access_token"

    client_certificate     = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.client_certificate)"
    client_key             = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.client_key)"
    cluster_ca_certificate = "$base64decode(google_container_cluster.data-dome-cluster.master_auth.0.cluster_ca_certificate)"
  


【讨论】:

以上是关于在 GKE 集群上使用 Terraform 部署 Helm 工作负载的主要内容,如果未能解决你的问题,请参考以下文章

将现有 GKE 集群添加到 terraform stat 文件

在具有私有 GKE 集群的 Terraform 上使用 Kubernetes 提供程序

使用 Terraform 部署时 GKE Autopilot 无法调度

在 terraform 中禁用时默认启用 GKE 屏蔽节点

需要帮助 GKE 入口与 terraform 进行 n8n 部署

使用 terraform 将公共 GKE 更改为私有 GKE 集群