AWS 批处理 cloudformation - “CannotPullContainerError”

Posted

技术标签:

【中文标题】AWS 批处理 cloudformation - “CannotPullContainerError”【英文标题】:AWS batch cloudformation - “CannotPullContainerError” 【发布时间】:2021-04-16 16:49:51 【问题描述】:

我有一个带有 6 个资源的 AWS Batch POC 的 Cloud Formation 模板。

3 AWS::IAM::角色 1 AWS::Batch::ComputeEnvironment 1 AWS::Batch::JobQueue 1 AWS::Batch::JobDefinition

AWS::IAM::Role 具有策略“arn:aws:iam::aws:policy/AdministratorAccess”(为了避免问题。)

使用的角色:

1 进入 AWS::Batch::ComputeEnvironment 2 进入 AWS::Batch::JobDefinition

但即使使用策略“arn:aws:iam::aws:policy/AdministratorAccess”,我也会收到“CannotPullContainerError:来自守护进程的错误响应:获取 https://********.dkr.ecr。 eu-west-1.amazonaws.com/v2/:net/http:在等待连接时请求被取消(等待标头时超出了 Client.Timeout)“当我在工作时。

免责声明:一切都是 FARGATE(计算环境和作业),而不是 EC2

        AWSTemplateFormatVersion: '2010-09-09'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
        Description: Creates a POC AWS Batch environment.
        Parameters:
          Environment:
            Type: String
            Description: 'Environment Name'
            Default: TEST
          Subnets:
            Type: List<AWS::EC2::Subnet::Id>
            Description: 'List of Subnets to boot into'
          ImageName:
            Type: String
            Description: 'Name and tag of Process Container Image'
            Default: 'upload:6.0.0'

        Resources:
          BatchServiceRole:
            Type: 'AWS::IAM::Role'
            Properties:
              RoleName: !Join ['', ['Demo', BatchServiceRole]]
              AssumeRolePolicyDocument:
                Version: 2012-10-17
                Statement:
                  - Effect: 'Allow'
                    Principal:
                      Service: 'batch.amazonaws.com'
                    Action: 'sts:AssumeRole'
              ManagedPolicyArns:
                - 'arn:aws:iam::aws:policy/AdministratorAccess'
          BatchContainerRole:
            Type: 'AWS::IAM::Role'
            Properties:
              RoleName: !Join ['', ['Demo', BatchContainerRole]]
              AssumeRolePolicyDocument:
                Version: 2012-10-17
                Statement:
                  - 
                    Effect: 'Allow'
                    Principal:
                      Service:
                        - 'ecs-tasks.amazonaws.com'
                    Action: 
                      - 'sts:AssumeRole'
              ManagedPolicyArns:
                - 'arn:aws:iam::aws:policy/AdministratorAccess'
          BatchJobRole:
            Type: 'AWS::IAM::Role'
            Properties:
              RoleName: !Join ['', ['Demo', BatchJobRole]]
              AssumeRolePolicyDocument:
                Version: 2012-10-17
                Statement:
                  - Effect: 'Allow'
                    Principal:
                      Service: 'ecs-tasks.amazonaws.com'
                    Action: 'sts:AssumeRole'
              ManagedPolicyArns:
                - 'arn:aws:iam::aws:policy/AdministratorAccess'
          BatchCompute:
            Type: "AWS::Batch::ComputeEnvironment"
            Properties:
              ComputeEnvironmentName: DemoContentInput
              ComputeResources: 
                MaxvCpus: 256 
                SecurityGroupIds:
                  - sg-0b33333333333333
                Subnets: !Ref Subnets
                Type: FARGATE
              ServiceRole: !Ref BatchServiceRole
              State: ENABLED
              Type: Managed
          Queue:
            Type: "AWS::Batch::JobQueue"
            DependsOn: BatchCompute
            Properties:
              ComputeEnvironmentOrder: 
                - ComputeEnvironment: DemoContentInput 
                  Order: 1
              Priority: 1
              State: "ENABLED"
              JobQueueName: DemoContentInput
          ContentInputJob:
            Type: "AWS::Batch::JobDefinition"
            Properties:
              Type: Container
              ContainerProperties: 
                Command: 
                  - -v
                  - process
                  - new-file
                  - -o
                  - s3://contents/content_id/content_id.mp4
                Environment:
                  - Name: SECRETS
                    Value: !Join [ ':', [ 'resolve:secretsmanager:common.secrets:SecretString:aws_access_key_id', 'resolve:secretsmanager:common.secrets:SecretString:aws_secret_access_key' ] ] 
                  - Name: APPLICATION 
                    Value: upload
                  - Name: API_KEY 
                    Value: 'resolve:secretsmanager:common.secrets:SecretString:fluzo.api_key'
                  - Name: CLIENT
                    Value: upload-container
                  - Name: ENVIRONMENT
                    Value: !Ref Environment
                  - Name: SETTINGS
                    Value: !Join [ ':', [ 'resolve:secretsmanager:common.secrets:SecretString:aws_access_key_id', 'resolve:secretsmanager:common.secrets:SecretString:aws_secret_access_key', 'upload-container' ] ] 
                ExecutionRoleArn: 'arn:aws:iam::**********:role/DemoBatchJobRole'
                Image: !Join ['', [!Ref 'AWS::AccountId','.dkr.ecr.', !Ref 'AWS::Region', '.amazonaws.com/', !Ref ImageName ] ] 
                JobRoleArn: !Ref BatchContainerRole
                ResourceRequirements:
                  - Type: VCPU
                    Value: 1
                  - Type: MEMORY
                    Value: 2048
              JobDefinitionName: DemoContentInput
              PlatformCapabilities:
                - FARGATE
              RetryStrategy: 
                Attempts: 1
              Timeout: 
                AttemptDurationSeconds: 600 

进入 AWS::Batch::JobQueue:ContainerProperties:ExecutionRoleArn 我对 arn 进行了编码,因为如果写入 !Ref BatchJobRole 我会收到错误消息。但这个问题不是我的目标。

问题是如何避免“CannotPullContainerError:来自守护进程的错误响应:获取 https://********.dkr.ecr.eu-west-1.amazonaws.com/v2/:net/ http:在等待连接时请求被取消(等待标头时超出了 Client.Timeout)”,当我运行作业时。

【问题讨论】:

我认为连接超时与网络问题有关,如何检查路由、NAT GW、安全组? 您是!Ref Subnets 公有子网还是私有子网?您的 VPC 是如何配置的? 【参考方案1】:

听起来您无法从子网内访问互联网。

确保:

有一个与您的 VPC 关联的 Internet 网关设备(如果没有,请创建一个 - 即使您只是使用 nat-gateway 进行出口) 与您的子网关联的路由表有一条默认路由 (0.0.0./0) 到 Internet 网关或带有附加弹性 IP 的 nat-gateway。 附加的安全组具有允许端口和协议的出站 Internet 流量 (0.0.0.0/0) 的规则。 (例如 80/http、443/https) 与子网关联的网络访问控制列表(网络 ACL)具有允许到 Internet 的出站和入站流量的规则。

参考资料:

https://aws.amazon.com/premiumsupport/knowledge-center/ec2-connect-internet-gateway/

【讨论】:

以上是关于AWS 批处理 cloudformation - “CannotPullContainerError”的主要内容,如果未能解决你的问题,请参考以下文章

AWS Cloudformation-如何在 json/yaml 模板中处理字符串大写或小写

无法为执行AWS :: CloudFormation :: CustomResource的aws lambda函数设置环境变量

AWS > CloudFormation 模板 - 您可以在上传之前对其进行测试吗?

从 Lambda 函数内部访问 AWS CloudFormation ARN

AWS CloudFormer 支持哪些 AWS 服务?

AWS CloudFormation 可以调用 AWS API 吗?