使用张量流矩阵乘法测试 GPU

Posted 2023-04-15

技术标签:

【中文标题】使用张量流矩阵乘法测试 GPU【英文标题】：Testing GPU with tensorflow matrix multiplication 【发布时间】：2017-06-07 20:19:06 【问题描述】：

由于许多机器学习算法依赖矩阵乘法（或至少可以使用矩阵乘法来实现）来测试我的 GPU，我打算创建矩阵 a 、 b ，将它们相乘并记录完成计算所需的时间。

以下代码将生成两个维度为 300000,20000 的矩阵并将它们相乘：

import tensorflow as tf
import numpy as np

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)


#a = np.array([[1, 2, 3], [4, 5, 6]])
#b = np.array([1, 2, 3])

a = np.random.rand(300000,20000)
b = np.random.rand(300000,20000)

println("Init complete");

result = tf.mul(a , b)
v = sess.run(result) 

print(v)

这是比较 GPU 性能的充分测试吗？我还应该考虑哪些其他因素？

【问题讨论】：

【参考方案1】：

这是 matmul 基准测试的example，它避免了常见的陷阱，并且与 Titan X Pascal 上的官方 11 TFLOP 标记相匹配。

import os
import sys
os.environ["CUDA_VISIBLE_DEVICES"]="1"
import tensorflow as tf
import time

n = 8192
dtype = tf.float32
with tf.device("/gpu:0"):
    matrix1 = tf.Variable(tf.ones((n, n), dtype=dtype))
    matrix2 = tf.Variable(tf.ones((n, n), dtype=dtype))
    product = tf.matmul(matrix1, matrix2)


# avoid optimizing away redundant nodes
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
sess = tf.Session(config=config)

sess.run(tf.global_variables_initializer())
iters = 10

# pre-warming
sess.run(product.op)

start = time.time()
for i in range(iters):
  sess.run(product.op)
end = time.time()
ops = n**3 + (n-1)*n**2 # n^2*(n-1) additions, n^3 multiplications
elapsed = (end - start)
rate = iters*ops/elapsed/10**9
print('\n %d x %d matmul took: %.2f sec, %.2f G ops/sec' % (n, n,
                                                            elapsed/iters,
                                                            rate,))

【讨论】：

酷，我认为除了在场外引用代码之外，还应该在答案中发布您的代码。 GPU 未被发现，除非 os.environ["CUDA_VISIBLE_DEVICES"]="1" 被注释掉。适用于 Windows 10、tensorflow-gpu (1.4)、cuda_8.0.61_win10 和 cudnn-8.0-windows10-x64-v6.0。错误是

Cannot assign a device for operation 'Variable_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.

cuda_8.0.61_win10 是从 developer.nvidia.com/cuda-toolkit-archive 下载的。 cudnn-8.0-windows10-x64-v6.0 是从 developer.nvidia.com/rdp/cudnn-download 下载的。此测试正确显示了 gpu 性能。我的 1050 Ti 获得了 2.3 TFlops。这几乎是完全正确的。

以上是关于使用张量流矩阵乘法测试 GPU的主要内容，如果未能解决你的问题，请参考以下文章