创建 SageMaker 模型时出现 ValidationError

Posted

技术标签:

【中文标题】创建 SageMaker 模型时出现 ValidationError【英文标题】:ValidationError when creating a SageMaker Model 【发布时间】:2018-12-05 06:42:39 【问题描述】:

我是 AWS 的新手,并尝试通过参考他们的 demo 来构建模型(从 Web 控制台)。但是,当我尝试创建模型时,它给了我以下错误。

无法访问模型数据 https://s3.console.aws.amazon.com/s3/buckets/bucket_name/models/model_name-v0.1.hdf5. 请确保角色 “arn:aws:iam::id:role/service-role/AmazonSageMaker-ExecutionRole-xxx” 存在并且其信任关系策略允许该操作 服务主体“sagemaker.amazonaws.com”的“sts:AssumeRole”。 还要确保该角色具有“s3:GetObject”权限并且 对象位于 eu-west-1

我检查了 IAM 角色,它附加了 AmazonSageMakerFullAccessAmazonS3FullAccess 策略。此外,还为角色指定了信任关系(如下所示)。


  "Version": "2012-10-17",
  "Statement": [
    
      "Effect": "Allow",
      "Principal": 
        "Service": "sagemaker.amazonaws.com"
      ,
      "Action": "sts:AssumeRole"
    
  ]

我正确指定了 ECR 和 S3 路径,但我不知道发生了什么。有人可以帮我解决这个问题吗?

很抱歉,我无法提供更多信息,但如果需要,我会提供任何其他信息。

更新:

以下是 IAM 政策。

AmazonS3FullAccess


    "Version": "2012-10-17",
    "Statement": [
        
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "*"
        
    ]

AmazonSageMaker-ExecutionPolicy-xxx


    "Version": "2012-10-17",
    "Statement": [
        
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::<bucket_name>"
            ]
        ,
        
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::<bucket_name>/*"
            ]
        
    ]

AmazonSageMakerFullAccess


    "Version": "2012-10-17",
    "Statement": [
        
            "Effect": "Allow",
            "Action": [
                "sagemaker:*"
            ],
            "Resource": "*"
        ,
        
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "ecr:BatchCheckLayerAvailability",
                "cloudwatch:PutMetricData",
                "cloudwatch:PutMetricAlarm",
                "cloudwatch:DescribeAlarms",
                "cloudwatch:DeleteAlarms",
                "ec2:CreateNetworkInterface",
                "ec2:CreateNetworkInterfacePermission",
                "ec2:DeleteNetworkInterface",
                "ec2:DeleteNetworkInterfacePermission",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeVpcs",
                "ec2:DescribeDhcpOptions",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups",
                "application-autoscaling:DeleteScalingPolicy",
                "application-autoscaling:DeleteScheduledAction",
                "application-autoscaling:DeregisterScalableTarget",
                "application-autoscaling:DescribeScalableTargets",
                "application-autoscaling:DescribeScalingActivities",
                "application-autoscaling:DescribeScalingPolicies",
                "application-autoscaling:DescribeScheduledActions",
                "application-autoscaling:PutScalingPolicy",
                "application-autoscaling:PutScheduledAction",
                "application-autoscaling:RegisterScalableTarget",
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:DescribeLogStreams",
                "logs:GetLogEvents",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        ,
        
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::*SageMaker*",
                "arn:aws:s3:::*Sagemaker*",
                "arn:aws:s3:::*sagemaker*"
            ]
        ,
        
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket",
                "s3:GetBucketLocation",
                "s3:ListBucket",
                "s3:ListAllMyBuckets"
            ],
            "Resource": "*"
        ,
        
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": "*",
            "Condition": 
                "StringEqualsIgnoreCase": 
                    "s3:ExistingObjectTag/SageMaker": "true"
                
            
        ,
        
            "Action": "iam:CreateServiceLinkedRole",
            "Effect": "Allow",
            "Resource": "arn:aws:iam::*:role/aws-service-role/sagemaker.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint",
            "Condition": 
                "StringLike": 
                    "iam:AWSServiceName": "sagemaker.application-autoscaling.amazonaws.com"
                
            
        ,
        
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "*",
            "Condition": 
                "StringEquals": 
                    "iam:PassedToService": "sagemaker.amazonaws.com"
                
            
        
    ]

【问题讨论】:

您可以添加提到的 IAM 角色的政策文件吗? @Asdfg:更新了帖子。 这是一个愚蠢的错误。模型文件名和 tar.gz 文件名应该相同。我的代码应该从/opt/ml/model 文件夹中读取(我把它作为模型:() 顺便说一句,如果存储桶的名称中有“sagemaker”,您可以跳过 S3 权限 【参考方案1】:

我认为 sagemaker 执行策略缺少存储桶级别的权限。尝试将"arn:aws:s3:::&lt;bucket_name&gt;" 添加到 AmazonSageMaker-ExecutionPolicy-xxx

"Version": "2012-10-17", "Statement": [ "Action": [ "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::<bucket_name>" ] , "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::<bucket_name>", "arn:aws:s3:::<bucket_name>/*" ] ]

我使用 SageMaker 执行策略运行了演示,如下所示,它可以正常工作。这是非常宽松的政策。一旦生效,您可以根据您的存储桶名称更改它。

"Version": "2012-10-17", "Statement": [ "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::*" ] ]

【讨论】:

谢谢,这也有帮助。

以上是关于创建 SageMaker 模型时出现 ValidationError的主要内容,如果未能解决你的问题,请参考以下文章

为啥在 sagemaker 笔记本中导入 SparkContext 库时出现错误?

如何在等待响应时增加 AWS Sagemaker 调用超时

AWS SageMaker:使用托管在 S3 中的经过训练的模型创建终端节点

如何修复 aws 区域错误“ValueError:必须使用 SageMaker 支持的区域设置本地 AWS 配置”

无法读取 Sagemaker 语义分割模型批量转换输出文件

尝试从 SAP HANA DB 创建实体模型时出现连接错误