numpy 的 QR 看起来比 scipy 的快 - 为啥?

Posted

技术标签:

【中文标题】numpy 的 QR 看起来比 scipy 的快 - 为啥?【英文标题】:numpy's QR appears faster than scipy's - why?numpy 的 QR 看起来比 scipy 的快 - 为什么? 【发布时间】:2015-12-15 05:28:47 【问题描述】:

由于某种原因,numpy 的 QR 分解总是比 scipy 的快,这看起来很奇怪,因为 scipy 应该包含所有 numpy 以及更高级的功能。知道为什么吗?

import numpy.linalg as nla
import scipy.linalg as la
A = np.random.randn(4000,4000)

%timeit -n 3 -r 3 Q,R = nla.qr(A)
%timeit -n 3 -r 3 Q,R = la.qr(A)
%timeit -n 3 -r 3 Q,R = nla.qr(A)
%timeit -n 3 -r 3 Q,R = la.qr(A)

3 次循环,最好的 3 次:每个循环 1.17 秒

3 次循环,最好的 3 次:每个循环 1.21 秒

3 次循环,最好的 3 次:每个循环 1.05 秒

3 次循环,最好的 3 次:每个循环 1.21 秒

mode="raw" 的区别更明显。

%timeit -n 3 -r 3 Q = nla.qr(A, mode='raw')
%timeit -n 3 -r 3 Q = la.qr(A, mode='raw')
%timeit -n 3 -r 3 Q = nla.qr(A, mode='raw')
%timeit -n 3 -r 3 Q = la.qr(A, mode='raw')

3 个循环,3 个循环中的最佳值:每个循环 440 毫秒

3 个循环,3 个循环中的最佳值:每个循环 612 毫秒

3 个循环,3 个循环中的最佳值:每个循环 436 毫秒

3 个循环,3 个循环中的最佳值:每个循环 758 毫秒

我同时使用英特尔 MKL,并且我拥有最新的开发版本

numpy 版本:1.11.0.dev0+90a1a9f

scipy 版本:0.17.0.dev0+8003fab

NumPy config:
blas_opt_info:
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
mkl_info:
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
lapack_opt_info:
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
blas_mkl_info:
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']

SciPy config:
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
mkl_info:
    libraries = ['mkl_rt', 'pthread']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']

【问题讨论】:

如果 NumPy 的版本为您提供了所需的功能,请使用它。如果您需要 SciPy 的功能,请使用它。功能更全的库变慢的情况并不少见。 我查看了两个版本的代码。 scipy 不调用或合并numpy 之一。他们的mode逻辑不同,乍一看调用lapack的方法也不同。答案可能需要详细研究。 【参考方案1】:

非常有趣的帖子!面对类似的问题,我想出了下面这个不依赖timeit,而是探索各种矩阵形状的实验。

它似乎没有在这个特定环境下证实这一假设,但作为其他实验的起点可能是有意义的?

"""
Compares Scipy and Numpy QR execution times for various n x m random matrices
"""
import time

from scipy import linalg
import numpy as np
import matplotlib.pyplot as plt

from timeit import timeit

n_feature_grid = [100, 500, 1000, 2000, 4000, 7500, 10000]
max_timeit_nb = 100
m_ratio_grid = [0.05, 0.1, 0.3, 1.0]

exec_times_scipy = np.empty((len(n_feature_grid), len(m_ratio_grid), max_timeit_nb)) * np.nan
exec_times_numpy = np.empty((len(n_feature_grid), len(m_ratio_grid), max_timeit_nb)) * np.nan
rng = np.random.RandomState(69)

for i, n in enumerate(n_feature_grid):
    for j, m_ratio in enumerate(m_ratio_grid):
        m = int(np.ceil(m_ratio * n))
        if m * n < 100 * 100:
            timeit_nb = max_timeit_nb
        elif m * n < 1000 * 100:
            timeit_nb = 50
        elif m * n < 3000 * 10000:
            timeit_nb = 10
        else:
            timeit_nb = 5

        print("Experiment for matrix %i x %i, repeated %i times" % (n, m, timeit_nb))

        for k, nb in enumerate(range(timeit_nb)):
            X = rng.randn(n, m)

            # numpy
            start_time = time.perf_counter()
            Qn, Rn = np.linalg.qr(X, mode='reduced')
            exec_times_numpy[i, j, k] = time.perf_counter() - start_time

            # scipy
            start_time = time.perf_counter()
            Qs, Rs = linalg.qr(X, mode='economic')
            exec_times_scipy[i, j, k] = time.perf_counter() - start_time

            np.testing.assert_equal(Qs, Qn)
            np.testing.assert_equal(Rs, Rn)


avg_exec_times_scipy = exec_times_scipy.mean(axis=2)
std_exec_times_scipy = exec_times_scipy.std(axis=2)
avg_exec_times_numpy = exec_times_numpy.mean(axis=2)
std_exec_times_numpy = exec_times_numpy.std(axis=2)

avg_exec_times_ratio = np.nanmean((exec_times_scipy / exec_times_numpy), axis=2)
std_exec_times_ratio = np.nanstd((exec_times_scipy / exec_times_numpy), axis=2)
# debug
# np.count_nonzero(np.isnan((exec_times_scipy / exec_times_numpy)), axis=2)

# Plot results
fig, ax = plt.subplots(1, figsize=(16, 12))
for k, m_ratio in enumerate(m_ratio_grid):
    ax.errorbar(n_feature_grid, avg_exec_times_ratio[:, k], yerr=std_exec_times_ratio[:, k],
                marker='x', linestyle='-', # color='r',
                label='scipy/numpy(m=%.2f*n)' % m_ratio)

ax.legend(loc='upper left')
ax.set_ylabel("Ratio scipy/numpy QR exec time.")
ax.set_xlabel("n_features")
ax.set_title("scipy/numpy execution time ratio for QR on a n x m random matrix")

ax.plot([n_feature_grid[0], n_feature_grid[-1]], [1., 1.], linestyle='--', color='k')

plt.show()

生成下图,其中对于小型矩阵,该比​​率显然不经常高于 1,但对于大型非平方矩阵,该比​​率往往(虽然不是很明显)大于 1:

矩阵大小和 nb 重复出现在日志中:

Experiment for matrix 100 x 5, repeated 100 times
Experiment for matrix 100 x 10, repeated 100 times
Experiment for matrix 100 x 30, repeated 100 times
Experiment for matrix 100 x 100, repeated 50 times
Experiment for matrix 500 x 25, repeated 50 times
Experiment for matrix 500 x 50, repeated 50 times
Experiment for matrix 500 x 150, repeated 50 times
Experiment for matrix 500 x 500, repeated 10 times
Experiment for matrix 1000 x 50, repeated 50 times
Experiment for matrix 1000 x 100, repeated 10 times
Experiment for matrix 1000 x 300, repeated 10 times
Experiment for matrix 1000 x 1000, repeated 10 times
Experiment for matrix 2000 x 100, repeated 10 times
Experiment for matrix 2000 x 200, repeated 10 times
Experiment for matrix 2000 x 600, repeated 10 times
Experiment for matrix 2000 x 2000, repeated 10 times
Experiment for matrix 4000 x 200, repeated 10 times
Experiment for matrix 4000 x 400, repeated 10 times
Experiment for matrix 4000 x 1200, repeated 10 times
Experiment for matrix 4000 x 4000, repeated 10 times
Experiment for matrix 7500 x 375, repeated 10 times
Experiment for matrix 7500 x 750, repeated 10 times
Experiment for matrix 7500 x 2250, repeated 10 times
Experiment for matrix 7500 x 7500, repeated 5 times
Experiment for matrix 10000 x 500, repeated 10 times
Experiment for matrix 10000 x 1000, repeated 10 times
Experiment for matrix 10000 x 3000, repeated 5 times
Experiment for matrix 10000 x 10000, repeated 5 times

我使用的环境已经很旧了:

numpy                     1.19.1           py36h5510c5b_0
numpy-base                1.19.1           py36ha3acd2a_0
python                    3.6.8                h9f7ef89_7
scipy                     1.5.0            py36h9439919_0

(Windows 10 上的 conda 4.8.5)

请注意,我没有使用 'raw' 模式,因为如果我理解正确,它会使生成的矩阵不符合 numpy。您可能希望更改它以查看效果。

这项调查来自 scikit-learn 实验,请参阅 this ticket。

【讨论】:

以上是关于numpy 的 QR 看起来比 scipy 的快 - 为啥?的主要内容,如果未能解决你的问题,请参考以下文章

为啥 scipy 和 numpy fft 图看起来不同?

如何使用 python + NumPy / SciPy 计算滚动/移动平均值?

numpy pandas1

时间依赖的1D Schroedinger方程使用Numpy和SciPy solve_ivp

python中是否有t测试表(numpy,scipy等)

numpy和Pandas用法讲解