如何授予 AKS 通过 terraform 访问 ACR 的权限?

Posted

技术标签:

【中文标题】如何授予 AKS 通过 terraform 访问 ACR 的权限?【英文标题】:How to give permissions to AKS to access ACR via terraform? 【发布时间】:2020-05-15 14:21:54 【问题描述】:

问题和详情

如何允许 Azure 中的 Kubernetes 集群通过 terraform 与 A​​zure Container Registry 通信?

我想从我的 Azure 容器注册表加载自定义映像。不幸的是,我在 Kubernetes 应该从 ACR 下载图像时遇到了权限错误。

到目前为止我所尝试的

我没有 terraform (az cli) 的实验

在我通过 az cli 将 acr 附加到 aks 后,一切正常:

az aks update -n myAKSCluster -g myResourceGroup --attach-acr <acrName>

我对 terraform 的实验

这是我的 terraform 配置;我已经剥离了一些其他的东西。它本身就起作用。

terraform 
  backend "azurerm" 
    resource_group_name  = "tf-state"
    storage_account_name = "devopstfstate"
    container_name       = "tfstatetest"
    key                  = "prod.terraform.tfstatetest"
  


provider "azurerm" 

provider "azuread" 

provider "random" 


# define the password
resource "random_string" "password" 
  length  = 32
  special = true


# define the resource group
resource "azurerm_resource_group" "rg" 
        name = "myrg"
        location = "eastus2"


# define the app
resource "azuread_application" "tfapp" 
  name                       = "mytfapp"


# define the service principal
resource "azuread_service_principal" "tfapp" 
  application_id = azuread_application.tfapp.application_id


# define the service principal password
resource "azuread_service_principal_password" "tfapp" 
  service_principal_id = azuread_service_principal.tfapp.id
  end_date = "2020-12-31T09:00:00Z"
  value = random_string.password.result


# define the container registry
resource "azurerm_container_registry" "acr" 
  name                     = "mycontainerregistry2387987222"
  resource_group_name      = azurerm_resource_group.rg.name
  location                 = azurerm_resource_group.rg.location
  sku                      = "Basic"
  admin_enabled            = false


# define the kubernetes cluster
resource "azurerm_kubernetes_cluster" "mycluster" 
  name                = "myaks"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  dns_prefix          = "mycluster"
  network_profile 
    network_plugin      = "azure"
  

  default_node_pool 
    name       = "default"
    node_count = 1
    vm_size    = "Standard_B2s"
  
  # Use the service principal created above
  service_principal 
    client_id     = azuread_service_principal.tfapp.application_id
    client_secret = azuread_service_principal_password.tfapp.value
  
  tags = 
    Environment = "demo"
  
  windows_profile 
    admin_username = "dingding"
    admin_password = random_string.password.result
  


# define the windows node pool for kubernetes
resource "azurerm_kubernetes_cluster_node_pool" "winpool" 
  name                  = "winp"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.mycluster.id
  vm_size               = "Standard_B2s"
  node_count            = 1
  os_type       = "Windows"


# define the kubernetes name space
resource "kubernetes_namespace" "namesp" 
  metadata 
    name = "namesp"
  


# Try to give permissions, to let the AKR access the ACR
resource "azurerm_role_assignment" "acrpull_role" 
  scope                            = azurerm_container_registry.acr.id
  role_definition_name             = "AcrPull"
  principal_id                     = azuread_service_principal.tfapp.object_id
  skip_service_principal_aad_check = true

此代码改编自https://github.com/terraform-providers/terraform-provider-azuread/issues/104。

不幸的是,当我在 kubernetes 集群中启动一个容器时,我收到一条错误消息:

Failed to pull image "mycontainerregistry.azurecr.io/myunittests": [rpc error: code = Unknown desc = Error response from daemon: manifest for mycontainerregistry.azurecr.io/myunittests:latest not found: manifest unknown: manifest unknown, rpc error: code = Unknown desc = Error response from daemon: Get https://mycontainerregistry.azurecr.io/v2/myunittests/manifests/latest: unauthorized: authentication required]

更新/注意:

当我用上面的代码运行terraform apply时,资源的创建被中断了:

azurerm_container_registry.acr: Creation complete after 18s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222]
azurerm_role_assignment.acrpull_role: Creating...
azuread_service_principal_password.tfapp: Still creating... [10s elapsed]
azuread_service_principal_password.tfapp: Creation complete after 12s [id=000/000]
azurerm_kubernetes_cluster.mycluster: Creating...
azurerm_role_assignment.acrpull_role: Creation complete after 8s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222/providers/Microsoft.Authorization/roleAssignments/000]
azurerm_kubernetes_cluster.mycluster: Still creating... [10s elapsed]

Error: Error creating Managed Kubernetes Cluster "myaks" (Resource Group "myrg"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="ServicePrincipalNotFound" Message="Service principal clientID: 000 not found in Active Directory tenant 000, Please see https://aka.ms/aks-sp-help for more details."

  on test.tf line 56, in resource "azurerm_kubernetes_cluster" "mycluster":
  56: resource "azurerm_kubernetes_cluster" "mycluster" 

不过,我认为这只是因为创建服务主体需要几分钟时间。几分钟后,当我再次运行 terraform apply 时,它超出了这一点而没有问题。

【问题讨论】:

这看起来不错,您是否有机会使用 pullsecret?只是为了澄清,这运行没有任何错误?可能想将范围更改为azurerm_container_registry.acr.id,但两种方式都应该没问题,tbh 我不得不稍微修改它以独立运行它;代码更新。我还添加了一条关于在创建服务主体后terraform apply 运行中发生中断的注释。我已按照您的建议修改了范围,但图像仍未拉出。 :( 是的 - 它实际上确实适用于修改。我必须完全terraform destroy 资源并重新创建它们 - 那时一切都很好(在应用更改之前,同样的事情不起作用)。谢谢! 可能是 object_id 丢失了。 【参考方案1】:

这段代码对我有用。


resource "azuread_application" "aks_sp" 
  name = "sp-aks-$local.cluster_name"


resource "azuread_service_principal" "aks_sp" 
  application_id               = azuread_application.aks_sp.application_id
  app_role_assignment_required = false


resource "azuread_service_principal_password" "aks_sp" 
  service_principal_id = azuread_service_principal.aks_sp.id
  value                = random_string.aks_sp_password.result
  end_date_relative    = "8760h" # 1 year

  lifecycle 
    ignore_changes = [
      value,
      end_date_relative
    ]
  


resource "azuread_application_password" "aks_sp" 
  application_object_id = azuread_application.aks_sp.id
  value                 = random_string.aks_sp_secret.result
  end_date_relative     = "8760h" # 1 year

  lifecycle 
    ignore_changes = [
      value,
      end_date_relative
    ]
  


data "azurerm_container_registry" "pyp" 
  name                = var.container_registry_name
  resource_group_name = var.container_registry_resource_group_name


resource "azurerm_role_assignment" "aks_sp_container_registry" 
  scope                = data.azurerm_container_registry.pyp.id
  role_definition_name = "AcrPull"
  principal_id         = azuread_service_principal.aks_sp.object_id


# requires Azure Provider 1.37+
resource "azurerm_kubernetes_cluster" "pyp" 
  name                = local.cluster_name
  location            = azurerm_resource_group.pyp.location
  resource_group_name = azurerm_resource_group.pyp.name
  dns_prefix          = local.env_name_nosymbols
  kubernetes_version  = local.kubernetes_version

  default_node_pool 
    name            = "default"
    node_count      = 1
    vm_size         = "Standard_D2s_v3"
    os_disk_size_gb = 80
  

  windows_profile 
    admin_username = "winadm"
    admin_password = random_string.windows_profile_password.result
  

  network_profile 
    network_plugin     = "azure"
    dns_service_ip     = cidrhost(local.service_cidr, 10)
    docker_bridge_cidr = "172.17.0.1/16"
    service_cidr       = local.service_cidr
    load_balancer_sku  = "standard"
  

  service_principal 
    client_id     = azuread_service_principal.aks_sp.application_id
    client_secret = random_string.aks_sp_password.result
  

  addon_profile 
    oms_agent 
      enabled                    = true
      log_analytics_workspace_id = azurerm_log_analytics_workspace.pyp.id
    
  

  tags = local.tags

来源https://github.com/giuliov/pipeline-your-pipelines/tree/master/src/kubernetes/terraform

【讨论】:

是的;工作!我必须销毁整个资源集并重新申请才能生效。这需要一段时间;需要找到一个好的时间窗口,因此迟到的答案。谢谢!【参考方案2】:

(我做了上面的答案)

只需添加一种更简单的方法,您无需为可能需要它的任何其他人创建服务主体。

resource "azurerm_kubernetes_cluster" "kubweb" 
  name                = local.cluster_web
  location            = local.rgloc
  resource_group_name = local.rgname
  dns_prefix          = "$local.cluster_web-dns"
  kubernetes_version  = local.kubversion

  # used to group all the internal objects of this cluster
  node_resource_group = "$local.cluster_web-rg-node"

  # azure will assign the id automatically
  identity 
    type = "SystemAssigned"
  

  default_node_pool 
    name                 = "nodepool1"
    node_count           = 4
    vm_size              = local.vm_size
    orchestrator_version = local.kubversion
  

  role_based_access_control 
    enabled = true
  

  addon_profile 
    kube_dashboard 
      enabled = true
    
  

  tags = 
    environment = local.env
  


resource "azurerm_container_registry" "acr" 
  name                = "acr1"
  resource_group_name = local.rgname
  location            = local.rgloc
  sku                 = "Standard"
  admin_enabled       = true

  tags = 
    environment = local.env
  


# add the role to the identity the kubernetes cluster was assigned
resource "azurerm_role_assignment" "kubweb_to_acr" 
  scope                = azurerm_container_registry.acr.id
  role_definition_name = "AcrPull"
  principal_id         = azurerm_kubernetes_cluster.kubweb.kubelet_identity[0].object_id

【讨论】:

它非常时髦,以至于集群数据资源有 3-4 个主体 ID。不过,这似乎是正确的。至少它与这里的az aks show --resource-group groupName --name aksName --query identityProfile.kubeletidentity.objectId 匹配,其他人说它是正确的。 这应该是正确的答案。

以上是关于如何授予 AKS 通过 terraform 访问 ACR 的权限?的主要内容,如果未能解决你的问题,请参考以下文章

使用 vnet_subnet_id 通过 terraform 启动 AKS 群集 - Azure

Terraform 在 AKS 节点资源组中创建入口应用程序网关

将 azurerm_application_gateway 与 AKS 与 terraform 集成

使用 Terraform 创建具有托管标识的 Azure AKS 会导致 AutoUpgradePreview 未启用错误

在 Azure 上将卷添加到 Terraform AKS 群集时出现错误“没有这样的主机”

Terraform:如何安装多个版本的提供程序插件? [复制]