TensorFlow-cpu优化及numpy优化

Posted lunge-blog

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了TensorFlow-cpu优化及numpy优化相关的知识,希望对你有一定的参考价值。

 

1,TensorFlow-cpu优化

当你使用cpu版TensorFlow时(比如pip安装),你可能会遇到警告,说你cpu支持AVX/AVX2指令集,那么在以下网址下载对应版本。

https://github.com/fo40225/tensorflow-windows-wheel

具体使用github上有说明。

根据测试,安装AVX指令集后相应数学计算(矩阵乘法、分解等)速度是原来的3倍左右

 

2,numpy优化

一般现在的numpy默认都是支持openblas的,但是我发现支持mkl的更快。下载地址

https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy

查看numpy支持的优化:np.__config__.show()

以下附上测试代码及结果,你可以在自己电脑上测试。

‘‘‘
default numpy(openblas):
---------
Dotted two 4096x4096 matrices in 1.99 s. Dotted two vectors of length 524288 in 0.40 ms. SVD of a 2048x1024 matrix in 1.75 s. Cholesky decomposition of a 2048x2048 matrix in 0.21 s. Eigendecomposition of a 2048x2048 matrix in 10.31 s. ------------------------------------------------------
numpy+mkl:
---------- Dotted two 4096x4096 matrices in 1.56 s. Dotted two vectors of length 524288 in 0.33 ms. SVD of a 2048x1024 matrix in 1.07 s. Cholesky decomposition of a 2048x2048 matrix in 0.24 s. Eigendecomposition of a 2048x2048 matrix in 6.94 s.
‘‘‘ import numpy as np from time import time # Let‘s take the randomness out of random numbers (for reproducibility) np.random.seed(0) size = 4096 A, B = np.random.random((size, size)), np.random.random((size, size)) C, D = np.random.random((size * 128, )), np.random.random((size * 128, )) E = np.random.random((int(size / 2), int(size / 4))) F = np.random.random((int(size / 2), int(size / 2))) F = np.dot(F, F.T) G = np.random.random((int(size / 2), int(size / 2))) # Matrix multiplication N = 20 t = time() for i in range(N): np.dot(A, B) delta = time() - t print(Dotted two %dx%d matrices in %0.2f s. % (size, size, delta / N)) del A, B # Vector multiplication N = 5000 t = time() for i in range(N): np.dot(C, D) delta = time() - t print(Dotted two vectors of length %d in %0.2f ms. % (size * 128, 1e3 * delta / N)) del C, D # Singular Value Decomposition (SVD) N = 3 t = time() for i in range(N): np.linalg.svd(E, full_matrices=False) delta = time() - t print("SVD of a %dx%d matrix in %0.2f s." % (size / 2, size / 4, delta / N)) del E # Cholesky Decomposition N = 3 t = time() for i in range(N): np.linalg.cholesky(F) delta = time() - t print("Cholesky decomposition of a %dx%d matrix in %0.2f s." % (size / 2, size / 2, delta / N)) # Eigendecomposition t = time() for i in range(N): np.linalg.eig(G) delta = time() - t print("Eigendecomposition of a %dx%d matrix in %0.2f s." % (size / 2, size / 2, delta / N))

 

以上是关于TensorFlow-cpu优化及numpy优化的主要内容,如果未能解决你的问题,请参考以下文章

Python|线代矩阵问题

JVM运行状态评估及优化

45.JVM调优策略常见问题:内存泄漏(年老代堆空间被占满持久代被占满堆栈溢出线程堆栈满系统内存被占满)优化方法:优化目标优化GC步骤优化总结;案例分析(公司系统参数网上给的配置参数)

unity打包webgl局限及优化建议

unity打包webgl局限及优化建议

unity打包webgl局限及优化建议