如何授予 AKS 通过 terraform 访问 ACR 的权限?
Posted
技术标签:
【中文标题】如何授予 AKS 通过 terraform 访问 ACR 的权限?【英文标题】:How to give permissions to AKS to access ACR via terraform? 【发布时间】:2020-05-15 14:21:54 【问题描述】:问题和详情
如何允许 Azure 中的 Kubernetes 集群通过 terraform 与 Azure Container Registry 通信?
我想从我的 Azure 容器注册表加载自定义映像。不幸的是,我在 Kubernetes 应该从 ACR 下载图像时遇到了权限错误。
到目前为止我所尝试的
我没有 terraform (az cli) 的实验
在我通过 az cli 将 acr 附加到 aks 后,一切正常:
az aks update -n myAKSCluster -g myResourceGroup --attach-acr <acrName>
我对 terraform 的实验
这是我的 terraform 配置;我已经剥离了一些其他的东西。它本身就起作用。
terraform
backend "azurerm"
resource_group_name = "tf-state"
storage_account_name = "devopstfstate"
container_name = "tfstatetest"
key = "prod.terraform.tfstatetest"
provider "azurerm"
provider "azuread"
provider "random"
# define the password
resource "random_string" "password"
length = 32
special = true
# define the resource group
resource "azurerm_resource_group" "rg"
name = "myrg"
location = "eastus2"
# define the app
resource "azuread_application" "tfapp"
name = "mytfapp"
# define the service principal
resource "azuread_service_principal" "tfapp"
application_id = azuread_application.tfapp.application_id
# define the service principal password
resource "azuread_service_principal_password" "tfapp"
service_principal_id = azuread_service_principal.tfapp.id
end_date = "2020-12-31T09:00:00Z"
value = random_string.password.result
# define the container registry
resource "azurerm_container_registry" "acr"
name = "mycontainerregistry2387987222"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "Basic"
admin_enabled = false
# define the kubernetes cluster
resource "azurerm_kubernetes_cluster" "mycluster"
name = "myaks"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = "mycluster"
network_profile
network_plugin = "azure"
default_node_pool
name = "default"
node_count = 1
vm_size = "Standard_B2s"
# Use the service principal created above
service_principal
client_id = azuread_service_principal.tfapp.application_id
client_secret = azuread_service_principal_password.tfapp.value
tags =
Environment = "demo"
windows_profile
admin_username = "dingding"
admin_password = random_string.password.result
# define the windows node pool for kubernetes
resource "azurerm_kubernetes_cluster_node_pool" "winpool"
name = "winp"
kubernetes_cluster_id = azurerm_kubernetes_cluster.mycluster.id
vm_size = "Standard_B2s"
node_count = 1
os_type = "Windows"
# define the kubernetes name space
resource "kubernetes_namespace" "namesp"
metadata
name = "namesp"
# Try to give permissions, to let the AKR access the ACR
resource "azurerm_role_assignment" "acrpull_role"
scope = azurerm_container_registry.acr.id
role_definition_name = "AcrPull"
principal_id = azuread_service_principal.tfapp.object_id
skip_service_principal_aad_check = true
此代码改编自https://github.com/terraform-providers/terraform-provider-azuread/issues/104。
不幸的是,当我在 kubernetes 集群中启动一个容器时,我收到一条错误消息:
Failed to pull image "mycontainerregistry.azurecr.io/myunittests": [rpc error: code = Unknown desc = Error response from daemon: manifest for mycontainerregistry.azurecr.io/myunittests:latest not found: manifest unknown: manifest unknown, rpc error: code = Unknown desc = Error response from daemon: Get https://mycontainerregistry.azurecr.io/v2/myunittests/manifests/latest: unauthorized: authentication required]
更新/注意:
当我用上面的代码运行terraform apply
时,资源的创建被中断了:
azurerm_container_registry.acr: Creation complete after 18s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222]
azurerm_role_assignment.acrpull_role: Creating...
azuread_service_principal_password.tfapp: Still creating... [10s elapsed]
azuread_service_principal_password.tfapp: Creation complete after 12s [id=000/000]
azurerm_kubernetes_cluster.mycluster: Creating...
azurerm_role_assignment.acrpull_role: Creation complete after 8s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222/providers/Microsoft.Authorization/roleAssignments/000]
azurerm_kubernetes_cluster.mycluster: Still creating... [10s elapsed]
Error: Error creating Managed Kubernetes Cluster "myaks" (Resource Group "myrg"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="ServicePrincipalNotFound" Message="Service principal clientID: 000 not found in Active Directory tenant 000, Please see https://aka.ms/aks-sp-help for more details."
on test.tf line 56, in resource "azurerm_kubernetes_cluster" "mycluster":
56: resource "azurerm_kubernetes_cluster" "mycluster"
不过,我认为这只是因为创建服务主体需要几分钟时间。几分钟后,当我再次运行 terraform apply
时,它超出了这一点而没有问题。
【问题讨论】:
这看起来不错,您是否有机会使用 pullsecret?只是为了澄清,这运行没有任何错误?可能想将范围更改为azurerm_container_registry.acr.id
,但两种方式都应该没问题,tbh
我不得不稍微修改它以独立运行它;代码更新。我还添加了一条关于在创建服务主体后terraform apply
运行中发生中断的注释。我已按照您的建议修改了范围,但图像仍未拉出。 :(
是的 - 它实际上确实适用于修改。我必须完全terraform destroy
资源并重新创建它们 - 那时一切都很好(在应用更改之前,同样的事情不起作用)。谢谢!
可能是 object_id
丢失了。
【参考方案1】:
这段代码对我有用。
resource "azuread_application" "aks_sp"
name = "sp-aks-$local.cluster_name"
resource "azuread_service_principal" "aks_sp"
application_id = azuread_application.aks_sp.application_id
app_role_assignment_required = false
resource "azuread_service_principal_password" "aks_sp"
service_principal_id = azuread_service_principal.aks_sp.id
value = random_string.aks_sp_password.result
end_date_relative = "8760h" # 1 year
lifecycle
ignore_changes = [
value,
end_date_relative
]
resource "azuread_application_password" "aks_sp"
application_object_id = azuread_application.aks_sp.id
value = random_string.aks_sp_secret.result
end_date_relative = "8760h" # 1 year
lifecycle
ignore_changes = [
value,
end_date_relative
]
data "azurerm_container_registry" "pyp"
name = var.container_registry_name
resource_group_name = var.container_registry_resource_group_name
resource "azurerm_role_assignment" "aks_sp_container_registry"
scope = data.azurerm_container_registry.pyp.id
role_definition_name = "AcrPull"
principal_id = azuread_service_principal.aks_sp.object_id
# requires Azure Provider 1.37+
resource "azurerm_kubernetes_cluster" "pyp"
name = local.cluster_name
location = azurerm_resource_group.pyp.location
resource_group_name = azurerm_resource_group.pyp.name
dns_prefix = local.env_name_nosymbols
kubernetes_version = local.kubernetes_version
default_node_pool
name = "default"
node_count = 1
vm_size = "Standard_D2s_v3"
os_disk_size_gb = 80
windows_profile
admin_username = "winadm"
admin_password = random_string.windows_profile_password.result
network_profile
network_plugin = "azure"
dns_service_ip = cidrhost(local.service_cidr, 10)
docker_bridge_cidr = "172.17.0.1/16"
service_cidr = local.service_cidr
load_balancer_sku = "standard"
service_principal
client_id = azuread_service_principal.aks_sp.application_id
client_secret = random_string.aks_sp_password.result
addon_profile
oms_agent
enabled = true
log_analytics_workspace_id = azurerm_log_analytics_workspace.pyp.id
tags = local.tags
来源https://github.com/giuliov/pipeline-your-pipelines/tree/master/src/kubernetes/terraform
【讨论】:
是的;工作!我必须销毁整个资源集并重新申请才能生效。这需要一段时间;需要找到一个好的时间窗口,因此迟到的答案。谢谢!【参考方案2】:(我做了上面的答案)
只需添加一种更简单的方法,您无需为可能需要它的任何其他人创建服务主体。
resource "azurerm_kubernetes_cluster" "kubweb"
name = local.cluster_web
location = local.rgloc
resource_group_name = local.rgname
dns_prefix = "$local.cluster_web-dns"
kubernetes_version = local.kubversion
# used to group all the internal objects of this cluster
node_resource_group = "$local.cluster_web-rg-node"
# azure will assign the id automatically
identity
type = "SystemAssigned"
default_node_pool
name = "nodepool1"
node_count = 4
vm_size = local.vm_size
orchestrator_version = local.kubversion
role_based_access_control
enabled = true
addon_profile
kube_dashboard
enabled = true
tags =
environment = local.env
resource "azurerm_container_registry" "acr"
name = "acr1"
resource_group_name = local.rgname
location = local.rgloc
sku = "Standard"
admin_enabled = true
tags =
environment = local.env
# add the role to the identity the kubernetes cluster was assigned
resource "azurerm_role_assignment" "kubweb_to_acr"
scope = azurerm_container_registry.acr.id
role_definition_name = "AcrPull"
principal_id = azurerm_kubernetes_cluster.kubweb.kubelet_identity[0].object_id
【讨论】:
它非常时髦,以至于集群数据资源有 3-4 个主体 ID。不过,这似乎是正确的。至少它与这里的az aks show --resource-group groupName --name aksName --query identityProfile.kubeletidentity.objectId
匹配,其他人说它是正确的。
这应该是正确的答案。以上是关于如何授予 AKS 通过 terraform 访问 ACR 的权限?的主要内容,如果未能解决你的问题,请参考以下文章
使用 vnet_subnet_id 通过 terraform 启动 AKS 群集 - Azure
Terraform 在 AKS 节点资源组中创建入口应用程序网关
将 azurerm_application_gateway 与 AKS 与 terraform 集成
使用 Terraform 创建具有托管标识的 Azure AKS 会导致 AutoUpgradePreview 未启用错误