服务模型时出现 Amazon Sagemaker ModelError

Posted

技术标签:

【中文标题】服务模型时出现 Amazon Sagemaker ModelError【英文标题】:Amazon Sagemaker ModelError when serving model 【发布时间】:2021-03-26 23:50:09 【问题描述】:

我在 S3 存储桶中上传了一个变压器 roberta 模型。我现在尝试使用带有 SageMaker Python SDK 的 Pytorch 对模型进行推理。我指定了模型目录s3://snet101/sent.tar.gz,它是模型(pytorch_model.bin)及其所有依赖项的压缩文件。这是代码

model = PyTorchModel(model_data=model_artifact,
                   name=name_from_base('roberta-model'),
                   role=role, 
                   entry_point='torchserve-predictor2.py',
                   source_dir='source_dir',
                   framework_version='1.4.0',
                   py_version = 'py3',
                   predictor_cls=SentimentAnalysis)
predictor = model.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')
test_data = "text": "How many cows are in the farm ?"
prediction = predictor.predict(test_data)

我在预测器对象的预测方法上得到以下错误:

ModelError                                Traceback (most recent call last)
<ipython-input-6-bc621eb2e056> in <module>
----> 1 prediction = predictor.predict(test_data)

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant)
    123 
    124         request_args = self._create_request_args(data, initial_args, target_model, target_variant)
--> 125         response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    126         return self._handle_response(response)
    127 

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    355                     "%s() only accepts keyword arguments." % py_operation_name)
    356             # The "self" in this scope is referring to the BaseClient.
--> 357             return self._make_api_call(operation_name, kwargs)
    358 
    359         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    674             error_code = parsed_response.get("Error", ).get("Code")
    675             error_class = self.exceptions.from_code(error_code)
--> 676             raise error_class(parsed_response, operation_name)
    677         else:
    678             return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/roberta-model-2020-12-16-09-42-37-479 in account 165258297056 for more information.

我检查了服务器日志错误

java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n: Can't load config for '/.sagemaker/mms/models/model'. Make sure that:
'/.sagemaker/mms/models/model' is a correct model identifier listed on 'https://huggingface.co/models'
or '/.sagemaker/mms/models/model' is the correct path to a directory containing a config.json file

我该如何解决这个问题?

【问题讨论】:

【参考方案1】:

我有同样的问题,似乎端点正在尝试使用路径“/.sagemaker/mms/models/model”加载预训练模型并失败。

可能此路径不正确,或者可能无法访问 S3 存储桶,因此无法将模型存储到给定路径中。

【讨论】:

这并不能真正回答问题。如果您有其他问题,可以点击 进行提问。要在此问题有新答案时收到通知,您可以follow this question。一旦你有足够的reputation,你也可以add a bounty 来引起对这个问题的更多关注。 - From Review

以上是关于服务模型时出现 Amazon Sagemaker ModelError的主要内容,如果未能解决你的问题,请参考以下文章

如何在 Python 中本地部署 Amazon-SageMaker

为啥在 sagemaker 笔记本中导入 SparkContext 库时出现错误?

AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

10+位机器学习大神测评 Amazon SageMaker 全流程实战

如何在等待响应时增加 AWS Sagemaker 调用超时

Amazon SageMaker测评分享,效果超出预期