TensorFlow:Blas GEMM 启动失败
Posted
技术标签:
【中文标题】TensorFlow:Blas GEMM 启动失败【英文标题】:TensorFlow: Blas GEMM launch failed 【发布时间】:2017-10-14 20:40:42 【问题描述】:当我尝试使用 GPU 将 TensorFlow 与 Keras 一起使用时,我收到以下错误消息:
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\__main__.py:2: UserWarning: Update your `fit_generator` call to the Keras 2 API: `fit_generator(<keras.pre..., 37800, epochs=2, validation_data=<keras.pre..., validation_steps=4200)`
from ipykernel import kernelapp as app
Epoch 1/2
InternalError Traceback (most recent call last)
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
1038 try:
-> 1039 return fn(*args)
1040 except errors.OpError as e:
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1020 feed_dict, fetch_list, target_list,
-> 1021 status, run_metadata)
1022
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\contextlib.py in __exit__(self, type, value, traceback)
65 try:
---> 66 next(self.gen)
67 except StopIteration:
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\errors_impl.py in raise_exception_on_not_ok_status()
465 compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466 pywrap_tensorflow.TF_GetCode(status))
467 finally:
InternalError: Blas GEMM launch failed : a.shape=(64, 784), b.shape=(784, 10), m=64, n=10, k=784
[[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](flatten_1/Reshape, dense_1/kernel/read)]]
During handling of the above exception, another exception occurred:
InternalError Traceback (most recent call last)
<ipython-input-13-2a52d1079a66> in <module>()
1 history=model.fit_generator(batches, batches.n, nb_epoch=2,
----> 2 validation_data=val_batches, nb_val_samples=val_batches.n)
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
86 warnings.warn('Update your `' + object_name +
87 '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 88 return func(*args, **kwargs)
89 wrapper._legacy_support_signature = inspect.getargspec(func)
90 return wrapper
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\models.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_q_size, workers, pickle_safe, initial_epoch)
1108 workers=workers,
1109 pickle_safe=pickle_safe,
-> 1110 initial_epoch=initial_epoch)
1111
1112 @interfaces.legacy_generator_methods_support
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
86 warnings.warn('Update your `' + object_name +
87 '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 88 return func(*args, **kwargs)
89 wrapper._legacy_support_signature = inspect.getargspec(func)
90 return wrapper
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_q_size, workers, pickle_safe, initial_epoch)
1888 outs = self.train_on_batch(x, y,
1889 sample_weight=sample_weight,
-> 1890 class_weight=class_weight)
1891
1892 if not isinstance(outs, list):
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight)
1631 ins = x + y + sample_weights
1632 self._make_train_function()
-> 1633 outputs = self.train_function(ins)
1634 if len(outputs) == 1:
1635 return outputs[0]
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py in __call__(self, inputs)
2227 session = get_session()
2228 updated = session.run(self.outputs + [self.updates_op],
-> 2229 feed_dict=feed_dict)
2230 return updated[:len(self.outputs)]
2231
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
776 try:
777 result = self._run(None, fetches, feed_dict, options_ptr,
--> 778 run_metadata_ptr)
779 if run_metadata:
780 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
980 if final_fetches or final_targets:
981 results = self._do_run(handle, final_targets, final_fetches,
--> 982 feed_dict_string, options, run_metadata)
983 else:
984 results = []
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1030 if handle is None:
1031 return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1032 target_list, options, run_metadata)
1033 else:
1034 return self._do_call(_prun_fn, self._session, handle, feed_dict,
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
1050 except KeyError:
1051 pass
-> 1052 raise type(e)(node_def, op, message)
1053
1054 def _extend_graph(self):
InternalError: Blas GEMM launch failed : a.shape=(64, 784), b.shape=(784, 10), m=64, n=10, k=784
[[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](flatten_1/Reshape, dense_1/kernel/read)]]
Caused by op 'dense_1/MatMul', defined at:
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\__main__.py", line 3, in <module>
app.launch_new_instance()
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance
app.start()
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelapp.py", line 477, in start
ioloop.IOLoop.instance().start()
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tornado\ioloop.py", line 888, in start
handler_func(fd_obj, events)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 235, in dispatch_shell
handler(stream, idents, msg)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2683, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2787, in run_ast_nodes
if self.run_code(code, result):
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2847, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-10-1e7a3b259f23>", line 4, in <module>
model.add(Dense(10, activation='softmax'))
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\models.py", line 466, in add
output_tensor = layer(self.outputs[0])
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\topology.py", line 585, in __call__
output = self.call(inputs, **kwargs)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\layers\core.py", line 840, in call
output = K.dot(inputs, self.kernel)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 936, in dot
out = tf.matmul(x, y)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1801, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 1263, in _mat_mul
transpose_b=transpose_b, name=name)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op
op_def=op_def)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(64, 784), b.shape=(784, 10), m=64, n=10, k=784
[[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](flatten_1/Reshape, dense_1/kernel/read)]]
当我尝试通过 CPU 将 TensorFlow 与 Keras 一起使用时,我收到以下错误消息:
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\__main__.py:5: UserWarning: Update your `fit_generator` call to the Keras 2 API: `fit_generator(<keras.pre..., 37800, validation_steps=4200, validation_data=<keras.pre..., epochs=2)`
Epoch 1/2
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
1038 try:
-> 1039 return fn(*args)
1040 except errors.OpError as e:
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1020 feed_dict, fetch_list, target_list,
-> 1021 status, run_metadata)
1022
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\contextlib.py in __exit__(self, type, value, traceback)
65 try:
---> 66 next(self.gen)
67 except StopIteration:
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\errors_impl.py in raise_exception_on_not_ok_status()
465 compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466 pywrap_tensorflow.TF_GetCode(status))
467 finally:
InternalError: Blas GEMM launch failed : a.shape=(64, 784), b.shape=(784, 10), m=64, n=10, k=784
[[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](flatten_1/Reshape, dense_1/kernel/read)]]
[[Node: Assign_3/_84 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_374_Assign_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
During handling of the above exception, another exception occurred:
InternalError Traceback (most recent call last)
<ipython-input-14-f66b4d3d5b88> in <module>()
3 with tf.device('/cpu:0'):
4 history=model.fit_generator(batches, batches.n, nb_epoch=2,
----> 5 validation_data=val_batches, nb_val_samples=val_batches.n)
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
86 warnings.warn('Update your `' + object_name +
87 '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 88 return func(*args, **kwargs)
89 wrapper._legacy_support_signature = inspect.getargspec(func)
90 return wrapper
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\models.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_q_size, workers, pickle_safe, initial_epoch)
1108 workers=workers,
1109 pickle_safe=pickle_safe,
-> 1110 initial_epoch=initial_epoch)
1111
1112 @interfaces.legacy_generator_methods_support
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
86 warnings.warn('Update your `' + object_name +
87 '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 88 return func(*args, **kwargs)
89 wrapper._legacy_support_signature = inspect.getargspec(func)
90 return wrapper
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_q_size, workers, pickle_safe, initial_epoch)
1888 outs = self.train_on_batch(x, y,
1889 sample_weight=sample_weight,
-> 1890 class_weight=class_weight)
1891
1892 if not isinstance(outs, list):
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight)
1631 ins = x + y + sample_weights
1632 self._make_train_function()
-> 1633 outputs = self.train_function(ins)
1634 if len(outputs) == 1:
1635 return outputs[0]
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py in __call__(self, inputs)
2227 session = get_session()
2228 updated = session.run(self.outputs + [self.updates_op],
-> 2229 feed_dict=feed_dict)
2230 return updated[:len(self.outputs)]
2231
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
776 try:
777 result = self._run(None, fetches, feed_dict, options_ptr,
--> 778 run_metadata_ptr)
779 if run_metadata:
780 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
980 if final_fetches or final_targets:
981 results = self._do_run(handle, final_targets, final_fetches,
--> 982 feed_dict_string, options, run_metadata)
983 else:
984 results = []
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1030 if handle is None:
1031 return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1032 target_list, options, run_metadata)
1033 else:
1034 return self._do_call(_prun_fn, self._session, handle, feed_dict,
C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
1050 except KeyError:
1051 pass
-> 1052 raise type(e)(node_def, op, message)
1053
1054 def _extend_graph(self):
InternalError: Blas GEMM launch failed : a.shape=(64, 784), b.shape=(784, 10), m=64, n=10, k=784
[[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](flatten_1/Reshape, dense_1/kernel/read)]]
[[Node: Assign_3/_84 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_374_Assign_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op 'dense_1/MatMul', defined at:
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\__main__.py", line 3, in <module>
app.launch_new_instance()
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance
app.start()
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelapp.py", line 477, in start
ioloop.IOLoop.instance().start()
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tornado\ioloop.py", line 888, in start
handler_func(fd_obj, events)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 235, in dispatch_shell
handler(stream, idents, msg)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2683, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2787, in run_ast_nodes
if self.run_code(code, result):
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2847, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-12-1e7a3b259f23>", line 4, in <module>
model.add(Dense(10, activation='softmax'))
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\models.py", line 466, in add
output_tensor = layer(self.outputs[0])
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\topology.py", line 585, in __call__
output = self.call(inputs, **kwargs)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\layers\core.py", line 840, in call
output = K.dot(inputs, self.kernel)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 936, in dot
out = tf.matmul(x, y)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1801, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 1263, in _mat_mul
transpose_b=transpose_b, name=name)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op
op_def=op_def)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Users\nicol\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(64, 784), b.shape=(784, 10), m=64, n=10, k=784
[[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](flatten_1/Reshape, dense_1/kernel/read)]]
[[Node: Assign_3/_84 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_374_Assign_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
在这两种情况下,错误在于 InternalError(参见上面的回溯):Blas GEMM 启动失败 你能告诉我如何让 Blas GEMM 启动吗? 我在 3.5 python anaconda 环境中安装了 tensorflow 和 keras,我还安装了所有需要的模块(numpy、pandas、scipy、scikit-learn)。我有一个带有可以使用 CUDA 的 NVIDIA gpu 的 Windows 10。我下载了 CUDA 和 cuDNN。我在 Chrome 上使用 Jupyter notebook。
有时当我运行我的代码时,而不是出现这个错误,我得到它开始运行然后它崩溃了。崩溃后,我无法在我的 jupyter 笔记本上做任何事情,一段时间后,一个弹出窗口询问我是否要终止该页面。这是我在坠机后得到的图像。 !(http://www.hostingpics.net/viewer.php?id=647186tensorflowError.png)
附:我知道我的问题与这个问题类似: Tensorflow Basic Example Error: CUBLAS_STATUS_NOT_INITIALIZED 但它还没有在那里解决,我不确定这个问题是否足够清楚或者与我的问题完全相同,所以我用我自己的错误消息发布它。 这个问题与以下不同: TensorFlow: InternalError: Blas SGEMM launch failed 因为我有 GEMM 而不是 SGEMM 的问题,而且我的问题是 gpu 和 cpu 的问题,这个问题的答案没有解决。
【问题讨论】:
GPU 3090 上的同样问题。 检查您的 GPU 使用情况。我的 GPU 已满时出现此错误。 【参考方案1】:这在 TensorFlow 2.1.0 上对我有用(每个:https://www.tensorflow.org/api_docs/python/tf/config/experimental/set_memory_growth)
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
for device in physical_devices:
tf.config.experimental.set_memory_growth(device, True)
【讨论】:
谢谢,这似乎也适用于 Tensorflow 2.1.0 tensorflow 2.0 怎么样,如果我可以问的话 @MasterControlProgram 在 2.0 API 中有,但我在 2.0 中并没有真正尝试过。 我不得不对多 GPU 机器的代码稍作改动,但发现它最终可以工作。 谢谢,你拯救了我的一天,我在使用 Keras 的 R 上遇到了错误failed to create cublas handle
,我解决了这个问题。【参考方案2】:
这是一个简单的解决方案,但解决所有问题是一场噩梦
在 Windows 上,我发现 Keras 安装在 Anaconda3\Lib\site-packages\keras
来源:
https://www.tensorflow.org/guide/using_gpu
https://github.com/keras-team/keras/blob/master/keras/backend/tensorflow_backend.py
在您的 keras/tensorflow_backend.py 文件中找到以下内容 您将在两个地方添加 config.gpu_options.allow_growth= True
if _SESSION is None:
if not os.environ.get('OMP_NUM_THREADS'):
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth=True
else:
num_thread = int(os.environ.get('OMP_NUM_THREADS'))
config = tf.ConfigProto(intra_op_parallelism_threads=num_thread,
allow_soft_placement=True)
config.gpu_options.allow_growth=True
_SESSION = tf.Session(config=config)
session = _SESSION
【讨论】:
最好读一行为什么 :) 对我来说,解决方案是 GPU 不在后台使用 以上几行在 tensorflow 2.0 中不存在【参考方案3】:确保您没有使用 GPU 运行的其他进程。运行 nvidia-smi 进行检查。
来源: An issue brought up by @reedwm.
【讨论】:
例如当你并行使用pycharm和jupyter时! 谢谢。以前的答案也对我有用,但我想知道为什么这个问题突然出现。看到这个我杀了正在运行的进程,问题就自动解决了。 另一个例子:在 JupyterLab 中运行多个内核。仅仅用这个 bug 重启内核是不够的;我必须先关闭所有其他内核。 这对我来说是个问题。我不得不使用 tensorflow-gpu 杀死之前运行的 Jupyter Notebook 来训练我的第二个 Notebook 代码。【参考方案4】:在导入后添加以下行解决了问题:
configuration = tf.compat.v1.ConfigProto()
configuration.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=configuration)
【讨论】:
@sɐunıɔןɐqɐp ,这个解决方案效果很好。你想更多地解释它的作用,以及它为什么起作用。谢谢。 @user288609:实际上你应该问 burhan rashid,而不是问我。 @user288609:可能这是他的消息来源:kobkrit.com/… "...在他们的 Tensorflow 或 Keras 环境中没有“allow_growth”选项,它会导致显卡的内存将完全分配给该进程。实际上,它可能只需要“ ...”只需在 Tensorflow 或 Keras 中启用“allow_growth”设置。以下代码用于在 Tensorflow 中设置 allow_growth 内存选项。这增加了显卡利用率,而不是将进程数限制为主机拥有的卡数量。 "【参考方案5】:这个答案与 Tensorflow 密切相关:
有时 TensorFlow 在 Windows 中创建失败。
在大多数情况下使用 gpu 重启笔记本可以解决问题
如果没有,则在您的代码中添加这些选项后尝试重新启动笔记本。
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.9)
tf.Session(config=tf.ConfigProto(gpu_options=gpu_options,allow_soft_placement=True)
我在使用 Keras 时从未遇到过这样的错误,但请尝试重新启动您的笔记本
【讨论】:
末尾缺少右括号 会很有帮助.. 不幸的是,它在下一个版本 2 中没有 GPUOptions()。你能更新一下吗?谢谢你!【参考方案6】:有同样的错误。可能跟tensorflow是allocating all gpu memory的问题有关。但是建议的修复对我不起作用,并且还不可能通过 keras.json 或命令行限制 tensorflow 的 gpu 内存使用。将 keras 的后端切换到 Theano 为我解决了这个问题(如何找到 here)。
【讨论】:
【参考方案7】:对我来说,使用 Python 关闭和重新启动我的进程是可行的。
我在这里尝试了一些方法,但没有奏效。例如,
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
线。我认为这是因为我使用的是更新版本的 Keras 和 Tensorflow。我在互联网上阅读的很多内容,包括 Keras 官方教程,由于版本冲突而无法使用。
但我看到了几篇关于运行不止一个 Python 进程的帖子。所以我关闭了 Jupyter、Anaconda 和 PyCharm,然后重新启动了一切。然后错误消失了。它可能适合您,也可能不适合您,但值得一试。
【讨论】:
【参考方案8】:我得到了完全相同的错误信息。我意识到我的 CUDA 安装存在错误,特别是 cuBLAS
库。
您可以通过运行示例程序simpleCUBLAS
来检查您的是否有同样的问题(它是CUDA安装附带的,您可能会在CUDA主文件夹中找到它:$CUDA_HOME\samples\7_CUDALibraries\simpleCUBLAS
)
尝试运行此程序。如果测试失败,您的 CUDA 安装有问题。您应该尝试重新安装它。这就是我在这里解决同样问题的方法。
将 cublas64_10.dll 重命名为 cublas64_100.dll 可能是一种解决方案。
【讨论】:
如果有人偶然发现:将 cublas64_10.dll 重命名为 cublas64_100.dll 对我有用。【参考方案9】:我遇到了同样的错误,幸运的是,我已经解决了。
我的错误是:上次,我打开了 tensorflow sess = tf.Session()
,但我忘记关闭会话了。
所以我打开终端,输入命令:
ps -aux | grep program_name
找到PID,然后输入命令kill PID:
kill -9 PID
好的,GPU 是真实的。
【讨论】:
【参考方案10】:我在尝试运行多个使用模型来提供预测的服务器时遇到了这个问题。由于我没有训练模型而只是使用它,因此使用 GPU 或 CPU 之间的差异很小。对于这种特定情况,可以通过“隐藏”GPU 来强制 Tensorflow 使用 CPU 来避免该问题。
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1" # Force TF to use only the CPU
【讨论】:
【参考方案11】:对我来说,这是一个失控的 ipynb 脚本,我以为我已经终止但实际上仍在运行,因此我的 GPU 正在使用中并且出现了这个错误
【讨论】:
【参考方案12】:有同样的错误(Win10 使用 Keras 和 Visual Studio Code)。即使在终止我的脚本之后,似乎 TensorFlow 仍然以某种方式处于活动状态。只需关闭 VS Code 并重新启动即可解决问题。
【讨论】:
【参考方案13】:我被这个问题困扰了好几天,最终我能够摆脱这个错误。
我的 PC 中安装了错误版本的 tensorflow 和 cuda。只需确保您安装了正确版本的 tf、cuda 和 cudnn。
https://i.stack.imgur.com/Laiii.png
使用此链接作为参考。
【讨论】:
【参考方案14】:我在推断 Tensorflow model 时遇到了类似的错误,通过将 Tensorflow 从 2.1 降级到 1.14 解决了这个问题。最初,我检查了 GPU 使用情况,它占用了所有 GPU 内存,无法执行推理,发现以下异常:
InternalError:发现 2 个根本错误。 (0) 内部:Blas GEMM 启动失败:a.shape=(86494, 257), b.shape=(257, 64), m=86494, n=64, k=257 [[node log_mel_features/MatMul ]] (1) 内部:Blas GEMM 启动失败:a.shape=(86494, 257), b.shape=(257, 64), m=86494, n=64, k=257 [[node log_mel_features/MatMul]] [[log_mel_features/Log/_769]] 下面是我使用的命令:
pip uninstall tensorflow-gpu
pip install tensorflow-gpu==1.14
【讨论】:
谢谢,兄弟,你节省了我的时间【参考方案15】:在 Windows 10 中打开的 Dos 窗口中打开了 python。从我的 IDE 运行时,它给出了上述消息。从我的 IDE 运行时,退出 Python 的 Dos 实例使我能够克服这个错误。
【讨论】:
我认为如果您打开了多个终端窗口并且某些 TF 资源仍在使用中,也会发生同样的事情。【参考方案16】:尝试运行示例程序 simpleCUBLAS(CUDA 附带)来测试您的 CUBLAS 安装,看看它是否有效。
就我而言(我使用的是 Ubuntu),我必须重新安装 CUDA 才能解决此问题。在我这样做之后,simpleCUBLAS 通过了测试。
由于某种原因,一段时间后我开始遇到同样的问题,我发现清理目录 .nv(在我的主文件夹中)解决了问题,simpleCUBLAS 测试再次通过。
【讨论】:
我正在运行 Ubuntu 16.04 并且 simpleCUBLAS 通过,但仍然收到错误“Blas GEMM 启动失败” @naisanza 你可能同时运行两个程序,都想使用 tensorflow gpu【参考方案17】:我使用的是 Jupyter Lab,但一定是我之前运行的另一个 TensofFlow 程序锁定了 GPU。杀死 Jupyter Lab 并重新启动后,错误消失了。
【讨论】:
【参考方案18】:我在 Keras 2.4.3 和 TensorFlow 2.3.0 上使用 PyCharm 在 Win10 中遇到了同样的错误
这似乎是与在 Windows 上运行的 TensorFlow 本身有关的错误。 通过关闭 PyCharm 并以管理员身份重新运行解决了该问题。
【讨论】:
【参考方案19】:当您的GPU内存已满时会出现该错误,因此请允许GPU的内存增长,该问题将得到修复。你可以使用下面的代码sn-p:
physical_devices = tf.config.experimental.list_physical_devices('GPU')
for device in physical_devices:
tf.config.experimental.set_memory_growth(device, True)
【讨论】:
以上是关于TensorFlow:Blas GEMM 启动失败的主要内容,如果未能解决你的问题,请参考以下文章