AWS EC2 导致 Streamlit ML App 出现问题

Posted

技术标签:

【中文标题】AWS EC2 导致 Streamlit ML App 出现问题【英文标题】:AWS EC2 causing issue with Streamlit ML App 【发布时间】:2021-05-25 12:21:52 【问题描述】:

这很奇怪,因为在我的本地机器上,这个问题没有发生,而且我的应用程序运行良好。

但是,当我在 AWS EC2 实例上运行应用程序时,它给了我一个关于 matplotlib 导入的错误。在 matplotlib 导入下方,我有 matplotlib.use('TkAgg')。当代码是这样的时候,Streamlit 应用程序会给我这个错误(仅在 EC2 实例上):

ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running

Traceback:
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/streamlit/script_runner.py", line 332, in _run_script
    exec(code, module.__dict__)
File "/home/ubuntu/extremely_unnecessary/app.py", line 16, in <module>
    matplotlib.use('TkAgg')
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/matplotlib/__init__.py", line 1171, in use
    plt.switch_backend(name)
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/matplotlib/pyplot.py", line 287, in switch_backend
    newbackend, required_framework, current_framework))

在做了一些研究之后,我尝试将违规行更改为matplotlib.use('agg')。当我这样做时,该应用程序可以正常运行,但是除了一个模型之外,没有其他模型在被选中时工作。

该应用托管在此处:http://54.193.229.139:8501/ 它的工作方式是您上传一张图片,然后从下拉菜单中选择一个预训练模型以将“样式迁移”应用到您上传的图片。

由于某种奇怪的原因,列表中的第 12 个模型 (chicken-strawberries-market-069_10000.pth) 可以正常工作,但其他型号都不能正常工作。同样,这只发生在 EC2 实例上 - 即使我使用 matplotlib.use('agg'),所有模型在本地运行流光应用程序时都可以工作。

我还尝试使用其他一些变体,包括 matplotlib.use('GTK3Agg')matplotlib.use('WebAgg'),它们会给我各种其他错误消息。

有谁知道如何解决这个问题,以便我可以让所有模型在 EC2 实例上运行?

编辑:我已经开始收到一条新的错误消息,我正在努力尝试更改代码。我通过我的 GPU 使用 CUDA,显然我必须进行一些 cpu-bound 更改,以便它可以在 ubuntu 服务器上运行。不知道为什么鸡肉草莓模型有效......

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Traceback:
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/streamlit/script_runner.py", line 332, in _run_script
    exec(code, module.__dict__)
File "/home/ubuntu/extremely_unnecessary/app.py", line 91, in <module>
    main()
File "/home/ubuntu/extremely_unnecessary/app.py", line 54, in main
    transformer.load_state_dict(torch.load(checkpoint))
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 774, in _legacy_load
    result = unpickler.load()
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 730, in persistent_load
    deserialized_objects[root_key] = restore_location(obj, location)
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 151, in _cuda_deserialize
    device = validate_cuda_device(location)
File "/home/ubuntu/anaconda3/envs/streamlit/lib/python3.6/site-packages/torch/serialization.py", line 135, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '

应用代码:

import matplotlib.pyplot as plt
from PIL import Image
from torchvision.utils import save_image
import tqdm
import streamlit as st
from models import TransformerNet
from utils import *
import torch
import numpy as np
from torch.autograd import Variable
import argparse
import tkinter as tk
import os
import cv2
import matplotlib
matplotlib.use('agg')


def main():

    uploaded_file = st.file_uploader(
        "Choose an image", type=['jpg', 'png', 'webm', 'mp4', 'gif', 'jpeg'])
    if uploaded_file is not None:
        st.image(uploaded_file, width=200)

    folder = os.path.abspath(os.getcwd())
    folder = folder + '/models'

    fnames = []

    for basename in os.listdir(folder):
        print(basename)
        fname = os.path.join(folder, basename)

        if fname.endswith('.pth'):
            fnames.append(fname)

    checkpoint = st.selectbox('Select a pretrained model', fnames)

    os.makedirs("images/outputs", exist_ok=True)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    # device = torch.device("cpu")
    transform = style_transform()

    # Define model and load model checkpoint
    transformer = TransformerNet().to(device)
    transformer.load_state_dict(torch.load(checkpoint))
    transformer.eval()

    # Prepare input
    image_tensor = Variable(transform(Image.open(
        uploaded_file).convert('RGB'))).to(device)
    image_tensor = image_tensor.unsqueeze(0)

    # Stylize image
    with torch.no_grad():
        stylized_image = denormalize(transformer(image_tensor)).cpu()

    fn = str(np.random.randint(0, 100)) + 'image.jpg'
    save_image(stylized_image, f"images/outputs/stylized-fn")

    st.image(f"images/outputs/stylized-fn")


if __name__ == "__main__":
    main()


【问题讨论】:

尝试构建一个容器并在本地运行它。这将从依赖项和版本的角度向您展示您可能缺少的内容。 像在 docker 容器中一样构建容器?我对 docker 很陌生——无论如何,我开始收到一条新的错误消息,我认为可以解释它,将添加到原始问题中 【参考方案1】:

原来我需要做的只是实现错误消息中的行 - 在第 53 行,我只需要更改它:

transformer.load_state_dict(torch.load(checkpoint))

到这里

transformer.load_state_dict(torch.load(
    checkpoint, map_location=torch.load('cpu')))

而且它有效!

【讨论】:

以上是关于AWS EC2 导致 Streamlit ML App 出现问题的主要内容,如果未能解决你的问题,请参考以下文章

如何基于java应用内存扩展aws ec2集群

AWS磁盘空间增加导致错误的卷大小

AWS EC2 实例定期删除安全组

从 EC2 CLI 退出后如何保持我的 Web 应用程序运行

亚马逊AWS免费套餐EC2安装centos连接登录并创建root

AWS(EC2)助我实现项目管理应用上云