Operation was explicitly assigned to /job:ps/task:0/device:CPU:0 but available devices are [ /job:lo
Posted lixiaolun
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Operation was explicitly assigned to /job:ps/task:0/device:CPU:0 but available devices are [ /job:lo相关的知识,希望对你有一定的参考价值。
训练启动时报错:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation ‘save/RestoreV2_10‘: Operation was explicitly assigned to /job:ps/task:0/device:CPU:0
but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
[[Node: save/RestoreV2_10 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:ps/task:0/device:CPU:0"](save/Const, save/RestoreV2_10/tensor_names, save/RestoreV2_10/shape_and_slices)]]
原因:available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]表示tf.Session没有连接到tf.train.Server,In particular, it seems to be a local (or "direct") session that can only access devices in the local process.
解决办法:要解决这个问题,需要在创建session时添加server.target。例如:
# Creating a session explicitly. with tf.Session(server.target) as sess: # ... # Using a `tf.train.Supervisor` called `sv`. with sv.managed_session(server.target): # ... # Using a `tf.train.MonitoredTrainingSession`. with tf.train.MonitoredTrainingSession(server.target): # ...
我们的代码
with tf.train.MonitoredTrainingSession( # is_chief=is_chief, checkpoint_dir=checkpoint_dir, save_checkpoint_secs=FLAGS.save_interval_secs, save_summaries_steps=100, save_summaries_secs=None, config=sess_config, hooks=hooks) as sess:
MonitoredTrainingSession中没有指定:
master=server.target
最终代码是:
with tf.train.MonitoredTrainingSession( master=server.target, is_chief=is_chief, checkpoint_dir=checkpoint_dir, save_checkpoint_secs=FLAGS.save_interval_secs, save_summaries_steps=100, save_summaries_secs=None, config=sess_config, hooks=hooks) as sess:
参考: https://stackoverflow.com/questions/42397370/distributed-tensorflow-save-fails-no-device
以上是关于Operation was explicitly assigned to /job:ps/task:0/device:CPU:0 but available devices are [ /job:lo的主要内容,如果未能解决你的问题,请参考以下文章
解决报错Could not satisfy explicit device specification '' because the node was colocated with a
Previous operation has not finished; run 'cleanup' if it was interrupted svn错误
npm install报错---The operation was rejected by your operating system
SVN总结:svn“Previous operation has not finished; run 'cleanup' if it was interrupted“
SVN同步时报错:“Previous operation has not finished; run 'cleanup' if it was interrupted”
SVN:Previous operation has not finished; run 'cleanup' if it was interrupted