KEDA 缩放器无法在 AKS 上使用 pod 身份触发身份验证

Posted

技术标签:

【中文标题】KEDA 缩放器无法在 AKS 上使用 pod 身份触发身份验证【英文标题】:KEDA scaler not working on AKS with trigger authentication using pod identity 【发布时间】:2021-11-29 16:55:12 【问题描述】:

KEDA 缩放器无法与使用 pod 身份对服务总线队列进行身份验证的触发器定义的缩放对象进行缩放。 我正在关注thisKEDA 服务总线触发扩展项目。 缩放与连接字符串一起工作正常,但是当我尝试使用 KEDA 缩放器的 pod 身份进行缩放时,keda 操作员无法使用以下 keda 操作员错误消息日志获取绑定到它的 azure 身份:

github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).isScaledObjectActive
        /workspace/pkg/scaling/scale_handler.go:228
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
        /workspace/pkg/scaling/scale_handler.go:211
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
        /workspace/pkg/scaling/scale_handler.go:145
2021-10-10T17:35:53.916Z        ERROR   azure_servicebus_scaler error   "error": "failed to refresh token, error: adal: Refresh request failed. Status Code = '400'. Response body: \"error\":\"invalid_request\",\"error_description\":\"Identity not found\"\n"

于 2021 年 9 月 11 日编辑 我在keda打开了一个github issue,我们做了一些故障排除。但正如@Tom 所建议的那样,这似乎是 AAD Pod Identity 的一个问题。 AD Pod Identity MIC pod 提供如下日志:

E1109 03:15:34.391759       1 mic.go:1111] failed to update user-assigned identities on node aks-agentpool-14229154-vmss (add [2], del [0], update[0]), error: failed to update identities for aks-agentpool-14229154-vmss in MC_Arun_democluster_westeurope, error: compute.VirtualMachineScaleSetsClient#Update: Failure sending request: StatusCode=0 -- Original Error: Code="LinkedAuthorizationFailed" Message="The client 'fe0d7679-8477-48e3-ae7d-43e2a6fdb957' with object id 'fe0d7679-8477-48e3-ae7d-43e2a6fdb957' has permission to perform action 'Microsoft.Compute/virtualMachineScaleSets/write' on scope '/subscriptions/f3786c6b-8dca-417d-af3f-23929e8b4129/resourceGroups/MC_Arun_democluster_westeurope/providers/Microsoft.Compute/virtualMachineScaleSets/aks-agentpool-14229154-vmss'; however, it does not have permission to perform action 'Microsoft.ManagedIdentity/userAssignedIdentities/assign/action' on the linked scope(s) '/subscriptions/f3786c6b-8dca-417d-af3f-23929e8b4129/resourcegroups/arun/providers/microsoft.managedidentity/userassignedidentities/autoscaler-id' or the linked scope(s) are invalid."

有什么解决办法吗?

我的缩放器对象的定义如下:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: trigger-auth-service-bus-orders
spec:
  podIdentity:
    provider: azure
---
apiVersion: keda.sh/v1alpha1 
kind: ScaledObject
metadata:
  name: order-scaler
spec:
  scaleTargetRef:
    name: order-processor
  # minReplicaCount: 0 Change to define how many minimum replicas you want
  maxReplicaCount: 10
  triggers:
  - type: azure-servicebus
    metadata:
      namespace: demodemobus
      queueName: orders
      messageCount: '5'
    authenticationRef:
      name: trigger-auth-service-bus-orders

我正在将 azure 身份部署到我的 keda 部署所在的 namespace keda。 并使用以下命令安装 KEDA 以使用 helm 设置 pod identity binding

helm install keda kedacore/keda --set podIdentity.activeDirectory.identity=app-autoscaler --namespace keda

预期行为 KEDA 缩放器应该可以很好地使用分配的 pod 身份和访问令牌来执行缩放

实际行为 KEDA 算子找不到分配的天蓝色标识,缩放失败

使用的定标器 Azure 服务总线

重现问题的步骤

    为 KEDA 创建 azure 标识和绑定 使用 aadpodidentitybinding 安装 KEDA 使用 KEDA pod 身份创建缩放对象并触发身份验证 缩放器无法进行身份验证和缩放

【问题讨论】:

【参考方案1】:

不幸的是,这看起来像是身份本身和 AD Pod 身份的问题,它们可能有点不稳定(根据我的经验)

【讨论】:

【参考方案2】:

首先,我将 AKS 与 kubenet 插件一起使用。

默认情况下 '从版本 v1.7 开始,在使用 Kubenet 的集群上默认禁用 AAD Pod 身份。'

这是因为 Kubenet 容易受到 ARP 欺骗。 请阅读here。

即使这样,您也可以有一个解决方法来启用 Kubenet 驱动的 AKS 中的 KEDA 缩放。(该脚本也适用于其他 CNI,除了您不需要使用 aad-pod-identity 组件 nmi daemonset 定义 yaml 编辑任何内容,如果它与您的集群插件一起运行良好。)。

下面我将为此添加一个 e2e 脚本。 请访问github issue 以访问所有讨论。

# Define aks name and resource group
$aksResourceGroup = "K8sScalingDemo"
$aksName = "K8sScalingDemo"

# Create resource group
az group create -n $aksResourceGroup -l centralindia

# Create the aks cluster with default kubenet plugin
az aks create -n $aksName -g $aksResourceGroup

# Resourcegroup where the aks resources will be deployed
$resourceGroup = "$(az aks show -g $aksResourceGroup -n $aksName --query nodeResourceGroup -otsv)"

# Set the kubectl context to the newly created aks cluster
az aks get-credentials -n $aksName -g $aksResourceGroup

# Install AAD Pod Identity into the aad-pod-identity namespace using helm
kubectl create namespace aad-pod-identity
helm repo add aad-pod-identity https://raw.githubusercontent.com/Azure/aad-pod-identity/master/charts
helm install aad-pod-identity aad-pod-identity/aad-pod-identity --namespace aad-pod-identity

# Check the status of installation 
kubectl --namespace=aad-pod-identity get pods -l "app.kubernetes.io/component=mic"
kubectl --namespace=aad-pod-identity get pods -l "app.kubernetes.io/component=nmi"

# the nmi components will Crashloop, ignore them for now. We will make them right later

# Get Resourcegroup Id of our $ResourceGroup
$resourceGroup_ResourceId = az group show --name $resourceGroup --query id -otsv

# Get the aks cluster kubeletidentity client id
$aad_pod_identity_clientid = az aks show -g $aksResourceGroup -n $aksName --query identityProfile.kubeletidentity.clientId -otsv

# Assign required roles for cluster over the resourcegroup
az role assignment create --role "Managed Identity Operator" --assignee $aad_pod_identity_clientid  --scope $resourceGroup_ResourceId
az role assignment create --role "Virtual Machine Contributor" --assignee $aad_pod_identity_clientid  --scope $resourceGroup_ResourceId

# Create autoscaler azure identity and get client id and resource id of the autoscaler identity
$autoScaleridentityName = "autoscaler-aad-identity"
az identity create --name $autoScaleridentityName  --resource-group $resourceGroup
$autoscaler_aad_identity_clientId = az identity show --name $autoScaleridentityName  --resource-group $resourceGroup --query clientId -otsv
$autoscaler_aad_identity_resourceId = az identity show --name $autoScaleridentityName  --resource-group $resourceGroup --query id -otsv

# Create the app azure identity and get client id and resource id of the app identity
$appIdentityName = "app-aad-identity"
az identity create --name app-aad-identity --resource-group $resourceGroup
$app_aad_identity_clientId = az identity show --name $appIdentityName --resource-group $resourceGroup --query clientId -otsv
$app_aad_identity_resourceId = az identity show --name $appIdentityName --resource-group $resourceGroup --query id -otsv

# Create service bus and queue
$servicebus = 'svcbusdemo'
az servicebus namespace create --name $servicebus --resource-group $resourceGroup --sku basic
$servicebus_namespace_resourceId = az servicebus namespace show --name $servicebus --resource-group $resourceGroup --query id -otsv

az servicebus queue create --namespace-name $servicebus --name orders --resource-group $resourceGroup
$servicebus_queue_resourceId = az servicebus queue show --namespace-name $servicebus --name orders --resource-group $resourceGroup --query id -otsv

# Assign Service Bus Data Receiver role to the app identity created
az role assignment create --role 'Azure Service Bus Data Receiver' --assignee $app_aad_identity_clientId  --scope $servicebus_queue_resourceId

# Create a namespace for order app deployment
kubectl create namespace keda-dotnet-sample

# Create a yaml deployment configuration variable
$app_with_identity_yaml= @"
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
  name: $appIdentityName
  annotations:
    aadpodidentity.k8s.io/Behavior: namespaced
spec:
  type: 0 # 0 means User-assigned MSI
  resourceID: $app_aad_identity_resourceId
  clientID: $app_aad_identity_clientId
---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentityBinding
metadata:
  name: $appIdentityName-binding
spec:
  azureIdentity: $appIdentityName
  selector: order-processor
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-processor
  labels:
    app: order-processor
spec:
  selector:
    matchLabels:
      app: order-processor
  template:
    metadata:
      labels:
        app: order-processor
        aadpodidbinding: order-processor
    spec:
      containers:
      - name: order-processor
        image: ghcr.io/kedacore/sample-dotnet-worker-servicebus-queue:latest
        env:
        - name: KEDA_SERVICEBUS_AUTH_MODE
          value: ManagedIdentity
        - name: KEDA_SERVICEBUS_HOST_NAME
          value: $servicebus.servicebus.windows.net
        - name: KEDA_SERVICEBUS_QUEUE_NAME
          value: orders
        - name: KEDA_SERVICEBUS_IDENTITY_USERASSIGNEDID
          value: $app_aad_identity_clientId
"@

# Create the app deployment with identity bindings using kubectl apply
$app_with_identity_yaml | kubectl apply --namespace keda-dotnet-sample -f -

# Now the order processor app works with the pod identity and 
# processes the queues 
# You can refer the [project ](https://github.com/kedacore/sample-dotnet-worker-servicebus-queue/blob/main/pod-identity.md) for that.

# Now start installation of KEDA in namespace keda-system

kubectl create namespace keda-system

# Create a pod identity and binding for autoscaler azure identity
$autoscaler_yaml =@"
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
  name: $autoScaleridentityName
spec:
  type: 0 # 0 means User-assigned MSI
  resourceID: $autoscaler_aad_identity_resourceId
  clientID: $autoscaler_aad_identity_clientId
---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentityBinding
metadata:
  name: $autoScaleridentityName-binding
spec:
  azureIdentity: $autoScaleridentityName
  selector: $autoScaleridentityName
"@
$autoscaler_yaml | kubectl apply --namespace keda-system -f -

# Install KEDA using helm
helm install keda kedacore/keda --set podIdentity.activeDirectory.identity=autoscaler-aad-identity --namespace keda-system

# Assign Service Bus Data Owner role to keda autoscaler identity
az role assignment create --role 'Azure Service Bus Data Owner' --assignee $autoscaler_aad_identity_clientId --scope $servicebus_namespace_resourceId

# Apply scaled object definition and trigger authentication provider as `azure`
$aap_autoscaling_yaml = @"
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: trigger-auth-service-bus-orders
spec:
  podIdentity:
    provider: azure
---
apiVersion: keda.sh/v1alpha1 
kind: ScaledObject
metadata:
  name: order-scaler
spec:
  scaleTargetRef:
    name: order-processor
  # minReplicaCount: 0 Change to define how many minimum replicas you want
  maxReplicaCount: 10
  triggers:
  - type: azure-servicebus
    metadata:
      namespace: $servicebus
      queueName: orders
      messageCount: '5'
    authenticationRef:
      name: trigger-auth-service-bus-orders
"@

$aap_autoscaling_yaml | kubectl apply --namespace keda-dotnet-sample -f -

# Now the Keda is getting 401 unauthorized error as the AAD Pod Identity comnponent `nmi` is not runnig on the system
# To fix it edit the daemonset for `nmi` component
# add the container arg `--allow-network-plugin-kubenet=true` by editing the `daemonset.apps/aad-pod-identity-nmi`
kubectl edit daemonset.apps/aad-pod-identity-nmi -n aad-pod-identity

# the containe arg section should look like this after editing:
    spec:
      containers:
      - args:
        - --node=$(NODE_NAME)
        - --http-probe-port=8085
        - --enableScaleFeatures=true
        - --metadata-header-required=true
        - --operation-mode=standard
        - --kubelet-config=/etc/default/kubelet
        - --allow-network-plugin-kubenet=true
        env:

# Now the KEDA is authenticated by aad-pod-identity metadata endpoint and the orderapp should scale up 
# with the queue counts
# If the order app still falls back to errors please delete and redeploy it.
# And that's it you just scaled your app up using KEDA on Kubenet AKS cluster.
注意:在 Kubenet 驱动的 AKS 上运行 AAD 身份之前,请阅读 this instruction。

【讨论】:

以上是关于KEDA 缩放器无法在 AKS 上使用 pod 身份触发身份验证的主要内容,如果未能解决你的问题,请参考以下文章

Keda AzureMonitor 触发器没有给出 activeDirectoryClientId 给定错误

在 C# 应用程序中获取集群 azure kubernetes 服务 (AKS) 运行状况/可用性

AKS初体验:登录到node节点的几种方式

使用托管标识进行 AKS 文件共享持久挂载 - 密钥轮换后出现问题

将 Azure 磁盘附加到 AKS pod 时出现权限错误

如何处理错误“dial tcp 10.240.0.4:10250: i/o timeout”以查看 AKS 中的 pod 日志?