我想重新训练 inception-resnet-v2,但 GPU 内核不可用

Posted

技术标签:

【中文标题】我想重新训练 inception-resnet-v2,但 GPU 内核不可用【英文标题】:I want to retrain inception-resnet-v2, but GPU kernel is not avaliable 【发布时间】:2019-02-21 05:56:30 【问题描述】:

我想使用“train_image_classifier.py”python 文件重新训练 inception-resnet-v2...

我的操作系统是 windows 10 64bit 我的 GPU 是 geforce gtx 1060 我使用 Python 3.6.5 我使用 TensorFlow 1.10.0 我使用 CUDA 9.0

如何打开GPU内核?请帮我重新训练。

好的!这是代码

https://github.com/tensorflow/models/blob/master/research/slim/train_image_classifier.py

这是一个变量。

--train_dir='C:\Users\stat\Desktop\hgh\retrain' --dataset_name=mnist --dataset_split_name=train --dataset_dir="C:\Users\stat\Desktop\hgh\TFR" --model_name=inception_resnet_v2 --batch_size=50 --max_number_of_steps=3000 --checkpoint_path= "C:\Users\stat\Desktop\hgh\inception_resnet_v2_2016_08_30.ckpt --checkpoint_exclude_scopes= InceptionResnetV2/Logits,InceptionResnetV2/AuxLogits --trainable_scopes= InceptionResnetV2/Logits,InceptionResnetV2/AuxLogits

主要的错误信息是这样的

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'InceptionResnetV2/Logits/Predictions': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Registered kernels:
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  [[Node: InceptionResnetV2/Logits/Predictions = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](InceptionResnetV2/Logits/Logits/BiasAdd)]]

你能帮帮我吗?

这是一条错误消息!

C:\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from "float" to "np.floating" is deprecated. In future, it will be treated as "np.float64 == np.dtype(float).type".
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From train_image_classifier.py:407: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
WARNING:tensorflow:From train_image_classifier.py:473: softmax_cross_entropy 
(from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.softmax_cross_entropy instead. Note that the order of the logits and labels arguments has been changed.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\losses\python\losses\loss_ops.py:398: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See @tf.nn.softmax_cross_entropy_with_logits_v2.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\losses\python\losses\loss_ops.py:399: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.compute_weighted_loss instead.
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\losses\python\losses\loss_ops.py:147: add_arg_scope.<locals>.func_with_args (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.
INFO:tensorflow:Fine-tuning from C:\Users\stat\Desktop\hgh\inception_resnet_v2_2016_08_30.ckpt
WARNING:tensorflow:From C:\Anaconda3\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py:737: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-09-17 15:37:12.622658: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-09-17 15:37:13.165705: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7085 pciBusID: 0000:08:00.0 totalMemory: 6.00GiB freeMemory: 4.96GiB
2018-09-17 15:37:13.176595: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-17 15:37:15.911929: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-17 15:37:15.918852: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971]      0
2018-09-17 15:37:15.923879: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:984] 0:   N
2018-09-17 15:37:15.931213: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4722 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:08:00.0, compute capability: 6.1)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Cannot assign a device for operation 'InceptionResnetV2/Logits/Predictions': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Registered kernels:
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  [[Node: InceptionResnetV2/Logits/Predictions = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](InceptionResnetV2/Logits/Logits/BiasAdd)]]
Caused by op 'InceptionResnetV2/Logits/Predictions', defined at:
File "train_image_classifier.py", line 580, in <module>
    tf.app.run()
File "C:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
File "train_image_classifier.py", line 481, in main
    clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
File "C:\Users\stat\Desktop\hgh\models-master\research\slim\deployment\model_deploy.py", line 193, in create_clones
    outputs = model_fn(*args, **kwargs)
File "train_image_classifier.py", line 464, in clone_fn
    logits, end_points = network_fn(images)
File "C:\Users\stat\Desktop\hgh\models-master\research\slim\nets\nets_factory.py", line 147, in network_fn
    return func(images, num_classes, is_training=is_training, **kwargs)
File "C:\Users\stat\Desktop\hgh\models-master\research\slim\nets\inception_resnet_v2.py", line 363, in inception_resnet_v2
    end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions')
File "C:\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1738, in softmax
    return _softmax(logits, gen_nn_ops.softmax, axis, name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1673, in _softmax
    return compute_op(logits, name=name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 7672, in softmax
    "Softmax", logits=logits, name=name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3155, in create_op
    op_def=op_def)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'InceptionResnetV2/Logits/Predictions': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Registered kernels:
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  [[Node: InceptionResnetV2/Logits/Predictions = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](InceptionResnetV2/Logits/Logits/BiasAdd)]]
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1278, in _do_call
    return fn(*args)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1261, in _run_fn
    self._extend_graph()
File "C:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1295, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'InceptionResnetV2/Logits/Predictions': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Registered kernels:
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  [[Node: InceptionResnetV2/Logits/Predictions = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](InceptionResnetV2/Logits/Logits/BiasAdd)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_image_classifier.py", line 580, in <module>
    tf.app.run()
File "C:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
File "train_image_classifier.py", line 576, in main
     sync_optimizer=optimizer if FLAGS.sync_replicas else None)
File "C:\Anaconda3\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 748, in train
    master, start_standard_services=False, config=session_config) as sess:
File "C:\Anaconda3\lib\contextlib.py", line 81, in __enter__
     return next(self.gen)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\training\supervisor.py", line 1005, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\training\supervisor.py", line 833, in stop
    ignore_live_threads=ignore_live_threads)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
     six.reraise(*self._exc_info_to_raise)
File "C:\Anaconda3\lib\site-packages\six.py", line 693, in reraise
     raise value
File "C:\Anaconda3\lib\site-packages\tensorflow\python\training\supervisor.py", line 994, in managed_session
     start_standard_services=start_standard_services)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\training\supervisor.py", line 731, in prepare_or_wait_for_session
    init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\training\session_manager.py", line 287, in prepare_session
     sess.run(init_op, feed_dict=init_feed_dict)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 877, in run
    run_metadata_ptr)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1272, in _do_run
     run_metadata)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'InceptionResnetV2/Logits/Predictions': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Registered kernels:
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  [[Node: InceptionResnetV2/Logits/Predictions = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](InceptionResnetV2/Logits/Logits/BiasAdd)]]
Caused by op 'InceptionResnetV2/Logits/Predictions', defined at:
File "train_image_classifier.py", line 580, in <module>
     tf.app.run()
File "C:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
File "train_image_classifier.py", line 481, in main
    clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
File "C:\Users\stat\Desktop\hgh\models-master\research\slim\deployment\model_deploy.py", line 193, in create_clones
     outputs = model_fn(*args, **kwargs)
File "train_image_classifier.py", line 464, in clone_fn 
     logits, end_points = network_fn(images)
File "C:\Users\stat\Desktop\hgh\models-master\research\slim\nets\nets_factory.py", line 147, in network_fn
    return func(images, num_classes, is_training=is_training, **kwargs)
File "C:\Users\stat\Desktop\hgh\models-master\research\slim\nets\inception_resnet_v2.py", line 363, in inception_resnet_v2
     end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions')
File "C:\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1738, in softmax
     return _softmax(logits, gen_nn_ops.softmax, axis, name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1673, in _softmax
    return compute_op(logits, name=name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 7672, in softmax
    "Softmax", logits=logits, name=name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3155, in create_op
     op_def=op_def)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1717, in __init__
     self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'InceptionResnetV2/Logits/Predictions': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Registered kernels:
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  [[Node: InceptionResnetV2/Logits/Predictions = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](InceptionResnetV2/Logits/Logits/BiasAdd)]]

【问题讨论】:

请也上传一些您的代码的 sn-p。我的第一个猜测是有些安装有问题:你安装了 tensorflow-gpu 吗? 我已经安装了 tensorflow 和 tensorflow-gpu。我的代码是“tensorflow\model\research\slim\train_image_classifier.py” 尝试删除两者并仅安装 tensorflow-gpu 我尝试删除两者并重新安装 tensorflow-gpu。但这并没有解决这个错误。 ?????? 【参考方案1】:

我升级了我的 tensorflow 1.11.0,这样这个错误就解决了!

谢谢

【讨论】:

以上是关于我想重新训练 inception-resnet-v2,但 GPU 内核不可用的主要内容,如果未能解决你的问题,请参考以下文章

InceptionV4

Inception-v1 与 Inception-Resnet-V1

深度学习100例-卷积神经网络(Inception-ResNet-v2)识别交通标志 | 第14天

使用 tiff 图像重新训练 Inception

使用 tf slim 重新训练预训练的 ResNet-50 模型以进行分类

如何为语言翻译重新训练序列到序列神经网络模型?