TensorFlow 为任何程序分配所有内存

Posted

技术标签:

【中文标题】TensorFlow 为任何程序分配所有内存【英文标题】:Tensorflow allocating all memory for any program 【发布时间】:2018-09-22 23:34:04 【问题描述】: 操作系统平台和发行版(例如,Linux Ubuntu 16.04):linux Ubuntu 16.04 TensorFlow 安装自(源代码或二进制文件):二进制文件 TensorFlow 版本(使用下面的命令):v1.4.0-rc1 Python 版本:3.5.5 CUDA/cuDNN 版本:CUDA 8.0 / cuDNN 6 GPU 型号和内存:nvidia gtx 1080

我是 TensorFlow 的新手。所以这很容易成为我看不到的一些愚蠢的安装错误。

我打开python测试TF安装:

import tensorflow as tf
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

导致:

 I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-04-11 21:39:44.830140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.8475
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 78.94MiB
2018-04-11 21:39:44.830178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-04-11 21:39:44.832231: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 78.94M (82771968 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.834394: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 71.04M (74494976 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.835825: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 63.94M (67045632 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.837560: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 57.55M (60341248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.839233: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 51.79M (54307328 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.841757: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 46.61M (48876800 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.843632: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 41.95M (43989248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.845588: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 37.76M (39590400 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.847229: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 33.98M (35631360 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.849278: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 30.58M (32068352 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.850967: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 27.52M (28861696 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality 

incarnation: 6037705122138393497
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 82771968
locality 
  bus_id: 1

incarnation: 11403601020071115295
physical_device_desc: "device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

【问题讨论】:

您真正的问题是什么?是否要分配内存?或者你想限制没有优先级? 【参考方案1】:

假设您的问题是“为什么 Tensorflow 分配所有可用的 GPU 内存,即使我的程序使用更少的内存就足够了?”,那么答案是他们这样做是为了减少 GPU 内存碎片.您可以使用config.gpu_options.allow_growthconfig.gpu_options.per_process_gpu_memory_fraction 等一些设置来更改此默认行为,以减少Tensorflow 的内存消耗,但代价是允许发生一些潜在的内存碎片。详细解释见Tensorflow Programmer's Guide Using GPU chapter。

【讨论】:

以上是关于TensorFlow 为任何程序分配所有内存的主要内容,如果未能解决你的问题,请参考以下文章

Tensorflow:将 allow_growth 设置为 true 仍然会分配我所有 GPU 的内存

TensorFlow GPU内存

没有分配任何内存的内存泄漏

Tensorflow分配内存:分配38535168超过系统内存的10%

Tensorflow 耗尽 GPU 内存:分配器 (GPU_0_bfc) 尝试分配内存不足

用于递归连接的 TensorFlow 高效共享内存分配