使用 torch.load 时出现运行时错误“存储大小错误:”
Posted
技术标签:
【中文标题】使用 torch.load 时出现运行时错误“存储大小错误:”【英文标题】:There is a Runtime error "storage has wrong size:" when I use torch.load 【发布时间】:2021-07-14 09:31:22 【问题描述】:我在调用 torch.load("pthfilename")
时收到“RuntimeError storage has wrong size”。我的模型在多个 GPU 上进行了训练,我使用以下代码保存了模型:
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"
device = torch.device(arg.local_rank)
net = Net().to(device)
net = torch.nn.parallel.DistributedDataParallel(net, device_ids=[arg.local_rank])
torch.save(net.state_dict(), "0.pth"))
错误是:
Traceback (most recent call last):
File "/root/PycharmProjects/test.py", line 8, in <module>
model_dict = torch.load("0.pth")
File "torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "torch/serialization.py", line 709, in _legacy_load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: storage has wrong size: expected -4916312287391674656 got 24
【问题讨论】:
你能帮帮我吗? 【参考方案1】:如果您使用多进程类型(例如:DistributedDataParallel)训练您的模型,您应该在保存模型时分配一个 local_rank。
refer this link, this 和 this ,希望这个解决方案可以帮助到你。
def save_checkpoint(epoch, model, best_top5, optimizer,
is_best=False,
filename='checkpoint.pth.tar'):
state =
'epoch': epoch+1, 'state_dict': model.state_dict(),
'best_top5': best_top5, 'optimizer' : optimizer.state_dict(),
torch.save(state, filename)
if args.local_rank == 0:
if is_best: save_checkpoint(epoch, model, best_top5, optimizer, is_best=True, filename='model_best.pth.tar')
【讨论】:
以上是关于使用 torch.load 时出现运行时错误“存储大小错误:”的主要内容,如果未能解决你的问题,请参考以下文章