创建 SageMaker 模型时出现 ValidationError
Posted
技术标签:
【中文标题】创建 SageMaker 模型时出现 ValidationError【英文标题】:ValidationError when creating a SageMaker Model 【发布时间】:2018-12-05 06:42:39 【问题描述】:我是 AWS 的新手,并尝试通过参考他们的 demo 来构建模型(从 Web 控制台)。但是,当我尝试创建模型时,它给了我以下错误。
无法访问模型数据
https://s3.console.aws.amazon.com/s3/buckets/
bucket_name/models/
model_name-v0.1.hdf5.
请确保角色 “arn:aws:iam::id:role/service-role/AmazonSageMaker-ExecutionRole-xxx” 存在并且其信任关系策略允许该操作 服务主体“sagemaker.amazonaws.com”的“sts:AssumeRole”。 还要确保该角色具有“s3:GetObject”权限并且 对象位于 eu-west-1。
我检查了 IAM 角色,它附加了 AmazonSageMakerFullAccess
和 AmazonS3FullAccess
策略。此外,还为角色指定了信任关系(如下所示)。
"Version": "2012-10-17",
"Statement": [
"Effect": "Allow",
"Principal":
"Service": "sagemaker.amazonaws.com"
,
"Action": "sts:AssumeRole"
]
我正确指定了 ECR 和 S3 路径,但我不知道发生了什么。有人可以帮我解决这个问题吗?
很抱歉,我无法提供更多信息,但如果需要,我会提供任何其他信息。
更新:
以下是 IAM 政策。
AmazonS3FullAccess
"Version": "2012-10-17",
"Statement": [
"Effect": "Allow",
"Action": "s3:*",
"Resource": "*"
]
AmazonSageMaker-ExecutionPolicy-xxx
"Version": "2012-10-17",
"Statement": [
"Action": [
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<bucket_name>"
]
,
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<bucket_name>/*"
]
]
AmazonSageMakerFullAccess
"Version": "2012-10-17",
"Statement": [
"Effect": "Allow",
"Action": [
"sagemaker:*"
],
"Resource": "*"
,
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"cloudwatch:PutMetricData",
"cloudwatch:PutMetricAlarm",
"cloudwatch:DescribeAlarms",
"cloudwatch:DeleteAlarms",
"ec2:CreateNetworkInterface",
"ec2:CreateNetworkInterfacePermission",
"ec2:DeleteNetworkInterface",
"ec2:DeleteNetworkInterfacePermission",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeVpcs",
"ec2:DescribeDhcpOptions",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups",
"application-autoscaling:DeleteScalingPolicy",
"application-autoscaling:DeleteScheduledAction",
"application-autoscaling:DeregisterScalableTarget",
"application-autoscaling:DescribeScalableTargets",
"application-autoscaling:DescribeScalingActivities",
"application-autoscaling:DescribeScalingPolicies",
"application-autoscaling:DescribeScheduledActions",
"application-autoscaling:PutScalingPolicy",
"application-autoscaling:PutScheduledAction",
"application-autoscaling:RegisterScalableTarget",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:DescribeLogStreams",
"logs:GetLogEvents",
"logs:PutLogEvents"
],
"Resource": "*"
,
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::*SageMaker*",
"arn:aws:s3:::*Sagemaker*",
"arn:aws:s3:::*sagemaker*"
]
,
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListAllMyBuckets"
],
"Resource": "*"
,
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "*",
"Condition":
"StringEqualsIgnoreCase":
"s3:ExistingObjectTag/SageMaker": "true"
,
"Action": "iam:CreateServiceLinkedRole",
"Effect": "Allow",
"Resource": "arn:aws:iam::*:role/aws-service-role/sagemaker.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint",
"Condition":
"StringLike":
"iam:AWSServiceName": "sagemaker.application-autoscaling.amazonaws.com"
,
"Effect": "Allow",
"Action": [
"iam:PassRole"
],
"Resource": "*",
"Condition":
"StringEquals":
"iam:PassedToService": "sagemaker.amazonaws.com"
]
【问题讨论】:
您可以添加提到的 IAM 角色的政策文件吗? @Asdfg:更新了帖子。 这是一个愚蠢的错误。模型文件名和 tar.gz 文件名应该相同。我的代码应该从/opt/ml/model
文件夹中读取(我把它作为模型:()
顺便说一句,如果存储桶的名称中有“sagemaker”,您可以跳过 S3 权限
【参考方案1】:
我认为 sagemaker 执行策略缺少存储桶级别的权限。尝试将"arn:aws:s3:::<bucket_name>"
添加到 AmazonSageMaker-ExecutionPolicy-xxx
"Version": "2012-10-17",
"Statement": [
"Action": [
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<bucket_name>"
]
,
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<bucket_name>",
"arn:aws:s3:::<bucket_name>/*"
]
]
我使用 SageMaker 执行策略运行了演示,如下所示,它可以正常工作。这是非常宽松的政策。一旦生效,您可以根据您的存储桶名称更改它。
"Version": "2012-10-17",
"Statement": [
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::*"
]
]
【讨论】:
谢谢,这也有帮助。以上是关于创建 SageMaker 模型时出现 ValidationError的主要内容,如果未能解决你的问题,请参考以下文章
为啥在 sagemaker 笔记本中导入 SparkContext 库时出现错误?
AWS SageMaker:使用托管在 S3 中的经过训练的模型创建终端节点