使用 pytorch 获取可用 GPU 内存的总量

Posted 2023-03-12

技术标签:

【中文标题】使用 pytorch 获取可用 GPU 内存的总量【英文标题】：Get total amount of free GPU memory and available using pytorch 【发布时间】：2020-02-01 14:06:48 【问题描述】：

我正在使用 google colab 免费 Gpu 进行实验，想知道有多少 GPU 内存可供使用，torch.cuda.memory_allocated() 返回当前占用的 GPU 内存，但我们如何使用 PyTorch 确定总可用内存.

【问题讨论】：

【参考方案1】：

PyTorch 可以为您提供总的、保留的和分配的信息：

t = torch.cuda.get_device_properties(0).total_memory
r = torch.cuda.memory_reserved(0)
a = torch.cuda.memory_allocated(0)
f = r-a  # free inside reserved

与 NVIDIA 的 Python 绑定可以为您提供整个 GPU 的信息（在这种情况下，0 表示第一个 GPU 设备）：

from pynvml import *
nvmlInit()
h = nvmlDeviceGetHandleByIndex(0)
info = nvmlDeviceGetMemoryInfo(h)
print(f'total    : info.total')
print(f'free     : info.free')
print(f'used     : info.used')

_{pip install pynvml}

您可以查看nvidia-smi 以获取内存信息。您可以使用nvtop，但需要从源代码安装此工具（在撰写本文时）。另一个可以检查内存的工具是 gpustat (pip3 install gpustat)。

如果你想使用 C++ cuda：

include <iostream>
#include "cuda.h"
#include "cuda_runtime_api.h"
  
using namespace std;
  
int main( void ) 
    int num_gpus;
    size_t free, total;
    cudaGetDeviceCount( &num_gpus );
    for ( int gpu_id = 0; gpu_id < num_gpus; gpu_id++ ) 
        cudaSetDevice( gpu_id );
        int id;
        cudaGetDevice( &id );
        cudaMemGetInfo( &free, &total );
        cout << "GPU " << id << " memory: free=" << free << ", total=" << total << endl;
    
    return 0;

【讨论】：

torch.cuda.memory_cached 已重命名为 torch.cuda.memory_reserved 更新了@Kallzvx。如果有什么问题，请告诉我。【参考方案2】：

这对我很有用！

def get_memory_free_MiB(gpu_index):
    pynvml.nvmlInit()
    handle = pynvml.nvmlDeviceGetHandleByIndex(int(gpu_index))
    mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
    return mem_info.free // 1024 ** 2

【讨论】：

以上是关于使用 pytorch 获取可用 GPU 内存的总量的主要内容，如果未能解决你的问题，请参考以下文章

在 Google Colaboratory 上，对于 Pytorch，GPU 的性能比 CPU 慢

pytorch查看gpu信息，gpu是否可用