AWS EC2 导致 Streamlit ML App 出现问题
Posted
技术标签:
【中文标题】AWS EC2 导致 Streamlit ML App 出现问题【英文标题】:AWS EC2 causing issue with Streamlit ML App 【发布时间】:2021-05-25 12:21:52 【问题描述】:这很奇怪,因为在我的本地机器上,这个问题没有发生,而且我的应用程序运行良好。
但是,当我在 AWS EC2 实例上运行应用程序时,它给了我一个关于 matplotlib 导入的错误。在 matplotlib 导入下方,我有 matplotlib.use('TkAgg')
。当代码是这样的时候,Streamlit 应用程序会给我这个错误(仅在 EC2 实例上):
ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running
Traceback:
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/streamlit/script_runner.py", line 332, in _run_script
exec(code, module.__dict__)
File "/home/ubuntu/extremely_unnecessary/app.py", line 16, in <module>
matplotlib.use('TkAgg')
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/matplotlib/__init__.py", line 1171, in use
plt.switch_backend(name)
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/matplotlib/pyplot.py", line 287, in switch_backend
newbackend, required_framework, current_framework))
在做了一些研究之后,我尝试将违规行更改为matplotlib.use('agg')
。当我这样做时,该应用程序可以正常运行,但是除了一个模型之外,没有其他模型在被选中时工作。
该应用托管在此处:http://54.193.229.139:8501/ 它的工作方式是您上传一张图片,然后从下拉菜单中选择一个预训练模型以将“样式迁移”应用到您上传的图片。
由于某种奇怪的原因,列表中的第 12 个模型 (chicken-strawberries-market-069_10000.pth
) 可以正常工作,但其他型号都不能正常工作。同样,这只发生在 EC2 实例上 - 即使我使用 matplotlib.use('agg')
,所有模型在本地运行流光应用程序时都可以工作。
我还尝试使用其他一些变体,包括 matplotlib.use('GTK3Agg')
和 matplotlib.use('WebAgg')
,它们会给我各种其他错误消息。
有谁知道如何解决这个问题,以便我可以让所有模型在 EC2 实例上运行?
编辑:我已经开始收到一条新的错误消息,我正在努力尝试更改代码。我通过我的 GPU 使用 CUDA,显然我必须进行一些 cpu-bound 更改,以便它可以在 ubuntu 服务器上运行。不知道为什么鸡肉草莓模型有效......
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Traceback:
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/streamlit/script_runner.py", line 332, in _run_script
exec(code, module.__dict__)
File "/home/ubuntu/extremely_unnecessary/app.py", line 91, in <module>
main()
File "/home/ubuntu/extremely_unnecessary/app.py", line 54, in main
transformer.load_state_dict(torch.load(checkpoint))
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 595, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 774, in _legacy_load
result = unpickler.load()
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 730, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 175, in default_restore_location
result = fn(storage, location)
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 151, in _cuda_deserialize
device = validate_cuda_device(location)
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 135, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
应用代码:
import matplotlib.pyplot as plt
from PIL import Image
from torchvision.utils import save_image
import tqdm
import streamlit as st
from models import TransformerNet
from utils import *
import torch
import numpy as np
from torch.autograd import Variable
import argparse
import tkinter as tk
import os
import cv2
import matplotlib
matplotlib.use('agg')
def main():
uploaded_file = st.file_uploader(
"Choose an image", type=['jpg', 'png', 'webm', 'mp4', 'gif', 'jpeg'])
if uploaded_file is not None:
st.image(uploaded_file, width=200)
folder = os.path.abspath(os.getcwd())
folder = folder + '/models'
fnames = []
for basename in os.listdir(folder):
print(basename)
fname = os.path.join(folder, basename)
if fname.endswith('.pth'):
fnames.append(fname)
checkpoint = st.selectbox('Select a pretrained model', fnames)
os.makedirs("images/outputs", exist_ok=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = torch.device("cpu")
transform = style_transform()
# Define model and load model checkpoint
transformer = TransformerNet().to(device)
transformer.load_state_dict(torch.load(checkpoint))
transformer.eval()
# Prepare input
image_tensor = Variable(transform(Image.open(
uploaded_file).convert('RGB'))).to(device)
image_tensor = image_tensor.unsqueeze(0)
# Stylize image
with torch.no_grad():
stylized_image = denormalize(transformer(image_tensor)).cpu()
fn = str(np.random.randint(0, 100)) + 'image.jpg'
save_image(stylized_image, f"images/outputs/stylized-fn")
st.image(f"images/outputs/stylized-fn")
if __name__ == "__main__":
main()
【问题讨论】:
尝试构建一个容器并在本地运行它。这将从依赖项和版本的角度向您展示您可能缺少的内容。 像在 docker 容器中一样构建容器?我对 docker 很陌生——无论如何,我开始收到一条新的错误消息,我认为可以解释它,将添加到原始问题中 【参考方案1】:原来我需要做的只是实现错误消息中的行 - 在第 53 行,我只需要更改它:
transformer.load_state_dict(torch.load(checkpoint))
到这里
transformer.load_state_dict(torch.load(
checkpoint, map_location=torch.load('cpu')))
而且它有效!
【讨论】:
以上是关于AWS EC2 导致 Streamlit ML App 出现问题的主要内容,如果未能解决你的问题,请参考以下文章
从 EC2 CLI 退出后如何保持我的 Web 应用程序运行