如何解决这个 pytorch 两个设备错误

Posted

技术标签:

【中文标题】如何解决这个 pytorch 两个设备错误【英文标题】:How can I solve this pytorch two devices error 【发布时间】:2022-01-05 13:32:52 【问题描述】:

我在使用 PyTorch 时遇到了问题: 预计所有张量都在同一个设备上,但发​​现至少有两个设备,cpu 和 cuda:0! (在方法 wrapper_addmm 中检查参数 mat1 的参数时)

model = nn.Sequential(
        nn.Linear(622, 512),
        nn.ReLU(),
        nn.Linear(512, 256),
        nn.ReLU(),
        nn.Linear(256, 5),
    ).to(device)

    loss_fn = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

    train_loader = Data.DataLoader(
        dataset=train_dataset,
        batch_size=32,
        shuffle=True,
        num_workers=0,
    )

    test_loader = Data.DataLoader(
        dataset=test_dataset,
        batch_size=100,
        shuffle=True,
        num_workers=0,
    )

    best_acc = 0
    best_model = model.cpu().state_dict().copy()
    # train_acc = 0
    # test_acc = 0
    for epoch in range(20):
        for step, (batch_x, batch_y) in enumerate(train_loader):
            batch_x = batch_x.to(device)
            batch_y = batch_y.to(device)
            print(batch_x)
            print(batch_x.device, 0)
            out = model(batch_x.to(device)).cuda()
            print(out.device, 1)
            loss = loss_fn(out, batch_y.long())
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            train_acc = np.mean((torch.argmax(out, 1) == batch_y).cpu().numpy())

            with torch.no_grad():
                for batch_x, batch_y in test_loader:
                    batch_x = batch_x.to(device)
                    batch_y = batch_y.to(device)
                    print(batch_x.device, 2)
                    out = model(batch_x)
                    print(batch_x.device, 3)
                    test_acc = np.mean((torch.argmax(out, 1) == batch_y).cpu().numpy())
            if test_acc > best_acc:
                best_acc = test_acc
                best_model = model.cpu().state_dict().copy()

谁能帮忙解释一下,我整天都在研究这个......

【问题讨论】:

源代码是'out = model(batch_x)',它会触发这个错误,所以我把它改成'out = model(batch_x.to(device)).cuda()',stiil有同样的错误。 【参考方案1】:

请注意,.to() 在应用于nn.Modules 和torch.tensors 时具有不同的行为:while for torch.tensor .to(device) creates a copy of the tensor on the device, with nn.Module .to(device) operates in place

在您的代码中,您将模型移至 CPU:

best_model = model.cpu().state_dict().copy()

确保在将模型移回 cpu 后将其移回 device

【讨论】:

以上是关于如何解决这个 pytorch 两个设备错误的主要内容,如果未能解决你的问题,请参考以下文章

Pytorch RNN 错误:RuntimeError:输入必须有 3 个维度得到 1

如何解决这个 java.lang.RuntimeException?

News|贾扬清回复:如何看待 Caffe2 代码并入 PyTorch ?

如何解决 iOS 设备的错误调整大小行为

干货|如何从TensorFlow转入PyTorch

如何在pytorch中连接两个不同尺寸的张量