为啥 TensorFlow matmul() 比 NumPy multiply() 慢得多？

Posted 2023-02-16

技术标签:

【中文标题】为啥 TensorFlow matmul() 比 NumPy multiply() 慢得多？【英文标题】：Why is TensorFlow matmul() much slower than NumPy multiply()?为什么 TensorFlow matmul() 比 NumPy multiply() 慢得多？ 【发布时间】：2016-12-20 14:20:39 【问题描述】：

在下面的python代码中，为什么通过numpy进行乘法的时间比通过tensorflow小很多？

import tensorflow as tf
import numpy as np
import time
size=10000
x = tf.placeholder(tf.float32, shape=(size, size))
y = tf.matmul(x, x)

with tf.Session() as sess:
  rand_array = np.random.rand(size, size)

  start_time = time.time()
  np.multiply(rand_array,rand_array)
  print("--- %s seconds numpy multiply ---" % (time.time() - start_time))

  start_time = time.time()
  sess.run(y, feed_dict=x: rand_array)
  print("--- %s seconds tensorflow---" % (time.time() - start_time))

输出是

--- 0.22089099884 seconds numpy multiply ---
--- 34.3198359013 seconds tensorflow---

【问题讨论】：

我在运行它时并没有观察到如此惊人的差异，但numpy 的速度大约是tensorflow 的五倍。另一个观察结果 - 我重新运行了 sess.run()，得到的结果几乎和 numpy 一样快 @martianwars 在我的电脑中重新运行 sess.run() 时间并不快我不得不稍微减少size，也许这可能是一个原因我得到了 --- 0.8210248947143555 秒 numpy 乘法 --- 和 --- 63.973095178604126 秒 tensorflow --- 或 77.9 倍。这是在速度较慢的 Intel Core 2 Duo E8400 @ 3.0GHz 上。 【参考方案1】：

好吧，引用文档：

numpy.multiply(x1, x2[, out]) = 参数相乘逐元素。

和

tf.matmul(a, b, transpose_a=False, transpose_b=False, a_is_sparse=False, b_is_sparse=False, name=None)

矩阵 a 乘以矩阵 b，得到 a * b。

输入必须是二维矩阵，具有匹配的内部尺寸，可能在转置之后。

这建议您比较不同的操作：O(n^2) 逐点乘法与 O(n^3) 矩阵乘法。在这两种情况下，我都将测试更正为使用矩阵乘法 2 次：

import tensorflow as tf
import numpy as np
import time
size=2000
x = tf.placeholder(tf.float32, shape=(size, size))
y = tf.matmul(x, x)
z = tf.matmul(y, x)

with tf.Session() as sess:
  rand_array = np.random.rand(size, size)

  start_time = time.time()
  for _ in xrange(10):
      np.dot(np.dot(rand_array,rand_array), rand_array)
  print("--- %s seconds numpy multiply ---" % (time.time() - start_time))

  start_time = time.time()
  for _ in xrange(10):
      sess.run(z, feed_dict=x: rand_array)
  print("--- %s seconds tensorflow---" % (time.time() - start_time))

得到了结果：

--- 2.92911195755 seconds numpy multiply ---
--- 0.32932305336 seconds tensorflow---

使用快速 GPU (gtx 1070)。

【讨论】：

numpy（使用 MKL）dot 在我的测试机器上比 tf.matmul（使用 haswell March 编译）快 80%。使用(5000000, 100) x (100,) 矩阵和向量进行测试。（需要在 tf 中添加一些额外的操作才能获得相同的结果：tf.squeeze(tf.matmul(a, tf.expand_dims(b, 0), transpose_b=True))）

以上是关于为啥 TensorFlow matmul() 比 NumPy multiply() 慢得多？的主要内容，如果未能解决你的问题，请参考以下文章