服务模型时出现 Amazon Sagemaker ModelError
Posted
技术标签:
【中文标题】服务模型时出现 Amazon Sagemaker ModelError【英文标题】:Amazon Sagemaker ModelError when serving model 【发布时间】:2021-03-26 23:50:09 【问题描述】:我在 S3 存储桶中上传了一个变压器 roberta 模型。我现在尝试使用带有 SageMaker Python SDK 的 Pytorch 对模型进行推理。我指定了模型目录s3://snet101/sent.tar.gz
,它是模型(pytorch_model.bin)及其所有依赖项的压缩文件。这是代码
model = PyTorchModel(model_data=model_artifact,
name=name_from_base('roberta-model'),
role=role,
entry_point='torchserve-predictor2.py',
source_dir='source_dir',
framework_version='1.4.0',
py_version = 'py3',
predictor_cls=SentimentAnalysis)
predictor = model.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')
test_data = "text": "How many cows are in the farm ?"
prediction = predictor.predict(test_data)
我在预测器对象的预测方法上得到以下错误:
ModelError Traceback (most recent call last)
<ipython-input-6-bc621eb2e056> in <module>
----> 1 prediction = predictor.predict(test_data)
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant)
123
124 request_args = self._create_request_args(data, initial_args, target_model, target_variant)
--> 125 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
126 return self._handle_response(response)
127
~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
355 "%s() only accepts keyword arguments." % py_operation_name)
356 # The "self" in this scope is referring to the BaseClient.
--> 357 return self._make_api_call(operation_name, kwargs)
358
359 _api_call.__name__ = str(py_operation_name)
~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
674 error_code = parsed_response.get("Error", ).get("Code")
675 error_class = self.exceptions.from_code(error_code)
--> 676 raise error_class(parsed_response, operation_name)
677 else:
678 return parsed_response
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/roberta-model-2020-12-16-09-42-37-479 in account 165258297056 for more information.
我检查了服务器日志错误
java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n: Can't load config for '/.sagemaker/mms/models/model'. Make sure that:
'/.sagemaker/mms/models/model' is a correct model identifier listed on 'https://huggingface.co/models'
or '/.sagemaker/mms/models/model' is the correct path to a directory containing a config.json file
我该如何解决这个问题?
【问题讨论】:
【参考方案1】:我有同样的问题,似乎端点正在尝试使用路径“/.sagemaker/mms/models/model”加载预训练模型并失败。
可能此路径不正确,或者可能无法访问 S3 存储桶,因此无法将模型存储到给定路径中。
【讨论】:
这并不能真正回答问题。如果您有其他问题,可以点击 进行提问。要在此问题有新答案时收到通知,您可以follow this question。一旦你有足够的reputation,你也可以add a bounty 来引起对这个问题的更多关注。 - From Review以上是关于服务模型时出现 Amazon Sagemaker ModelError的主要内容,如果未能解决你的问题,请参考以下文章
如何在 Python 中本地部署 Amazon-SageMaker
为啥在 sagemaker 笔记本中导入 SparkContext 库时出现错误?
AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker