Python模拟显示性能缓慢，如何加快数组计算

Posted 2023-02-23

技术标签:

【中文标题】Python模拟显示性能缓慢，如何加快数组计算【英文标题】：Python simulation shows slow performance, how to speed array calculation up 【发布时间】：2015-09-17 09:12:43 【问题描述】：

我正在尝试编写一个简单的 Python 程序，该程序计算来自 2 个入射激光束的干涉图案。一切正常，但速度很慢。我使用的是 400x400 阵列，更改参数后重新计算强度大约需要 1.3 秒。但是，使用 C++ 运行代码大约需要 0.18 秒。所以我想知道我是否可以改进一些东西来加快速度？

到目前为止我的代码：

def calculate_intensity_array():
    laser1 = zeros((400, 400), dtype=complex)
    laser2 = zeros((400, 400), dtype=complex)
    data = zeros((400, 400), dtype=complex)

    onoff_1 = laser1_onoff_var.get()
    A_1 = laser1_intensity_var.get()
    sigma_1 = laser1_sigma_var.get()
    sin_phi_1 = sin((laser1_phi_var.get() / 180) * pi)
    cos_phi_1 = cos((laser1_phi_var.get() / 180) * pi)
    sin_theta_1 = sin((laser1_theta_var.get() / 180) * pi)
    cos_theta_1 = cos((laser1_theta_var.get() / 180) * pi)
    mu_x_1 = laser1_xpos_var.get()
    mu_y_1 = laser1_ypos_var.get()

    onoff_2 = laser2_onoff_var.get()
    A_2 = laser2_intensity_var.get()
    sigma_2 = laser2_sigma_var.get()
    sin_phi_2 = sin((laser2_phi_var.get() / 180) * pi)
    sin_theta_2 = sin((laser2_theta_var.get() / 180) * pi)
    cos_phi_2 = cos((laser2_phi_var.get() / 180) * pi)
    cos_theta_2 = cos((laser2_theta_var.get() / 180) * pi)
    mu_x_2 = laser2_xpos_var.get()
    mu_y_2 = laser2_ypos_var.get()


    if onoff_1 == 0:
        laser1 = zeros((400, 400), dtype=complex)
    elif onoff_1 == 1:
        for i in range(400):
            for k in range(400):    
                laser1[i][k] = calculate_amplitude(
                    (k - 200) * 10,
                    (i - 200) * 10,
                    A_1, 
                    sigma_1, 
                    sin_phi_1,
                    cos_phi_1,
                    sin_theta_1,
                    cos_theta_1,
                    mu_x_1,
                    mu_y_1)

    if onoff_2 == 0:
        laser2 = zeros((400, 400), dtype=complex)
    elif onoff_2 == 1:
        for i in range(400):
            for k in range(400):         
                laser2[i][k] = calculate_amplitude(
                    (k - 200) * 10,
                    (i - 200) * 10,
                    A_2,
                    sigma_2,
                    sin_phi_2,
                    cos_phi_2,
                    sin_theta_2,
                    cos_theta_2,
                    mu_x_2,
                    mu_y_2)

    data = abs(laser1 + laser2) ** 2

    return data

def calculate_amplitude(x, y, A, sigma, sin_phi, cos_phi, 
                        sin_theta, cos_theta, mu_x, mu_y):

    amplitude = A * (1 / (sqrt(2  * pi * (sigma ** 2 / cos_theta ** 2)))) *
                exp(-((cos_phi *(x - mu_x) + sin_phi *(y - mu_y)) ** 2 * cos_theta ** 2) /
                    (2 * sigma ** 2)) * 
                (1 / (sqrt(2 * pi * sigma ** 2))) * 
                exp(-((-sin_phi * (x - mu_x) + cos_phi * (y - mu_y)) ** 2) / 
                    (2 * sigma ** 2)) * 
                cmath.exp(1j *(2 * pi / 0.650) * sin_theta * 
                          (cos_phi * (x - mu_x) + sin_phi * (y - mu_y)))


    return amplitude

start = time.clock()
draw_data = calculate_intensity_array()
print time.clock()-start

也许有什么引起你注意的事情应该做不同的？主要计算发生在 calculate_amplitude，但我尝试只输入 sin、cos 值，这样就不必每次都重新计算它们。

等效的 C++ 如下所示：

void calculate_intensity_array()

    double intensity_array[400][400];
    static complex<double> laser1[400][400];
    static complex<double> laser2[400][400];

    double A1 = 1;
    double sigma1 = 2000;
    double cos_theta1 = 0.9999619;
    double sin_theta1 = 0.00872654;
    double cos_phi1 = 1;
    double sin_phi1 = 0;
    double mu_x1 = 0.0;
    double mu_y1 = 0.0;

    double A2 = 1;
    double sigma2 = 2000;
    double cos_theta2 = 0.9999619;
    double sin_theta2 = 0.00872654;
    double cos_phi2 = 1;
    double sin_phi2 = 0;
    double mu_x2 = 0.0;
    double mu_y2 = 0.0;

    for (int i=0; i<400; i++)
    
        for (int j=0; j<400; j++)
        
            laser1[i][j] = calculate_amplitude((i-200)*10, (j-200)*10, A1, 
                                               sigma1, sin_phi1, cos_phi1, 
                                               sin_theta1, cos_theta1, 
                                               mu_x1, mu_y1);

            laser2[i][j]=calculate_amplitude((i-200)*10, (j-200)*10, A2, 
                                             sigma2, sin_phi2, cos_phi2, 
                                             sin_theta2, cos_theta2, 
                                             mu_x2, mu_y2);

            intensity_array[i][j] = pow(abs(laser1[i][j] + laser2[i][j]), 2);
        
    


complex<double> calculate_amplitude(double x, double y, double A,
                                    double sigma, double sin_phi,
                                    double cos_phi, double sin_theta,
                                    double cos_theta, double mu_x,
                                    double mu_y)

    complex<double> output;
     output = A * (1 / (sqrt(2 * M_PI * pow(sigma / cos_theta, 2)))) *
            exp(-(pow(cos_phi * (x - 200 - mu_x) + sin_phi * (y - 200 - mu_y), 2) * 
            pow(cos_theta, 2)) / (2 * pow(sigma, 2))) * 
            (1 / (sqrt(2 * M_PI * pow(sigma, 2)))) * 
            exp(-(pow(-sin_phi * (x - 200 - mu_x) + cos_phi * 
                (y - 200 - mu_y), 2)) / (2 * pow(sigma, 2))) *
            exp(complex<double>(0, (2 * M_PI / 0.650) * sin_theta *
                (cos_phi * (x - 200 - mu_x) + sin_phi * (y - 200 - mu_y))));

    return output;

【问题讨论】：

首先想到：C++ 可能内联函数调用，Python 做不到。使用numpy lib 和pypy... 我认为人族的建议是有效的。我还建议使用 Matlab 来处理这类事情，而不是 python。如果您想提高性能，开始使用分析器或仔细放置的工具（间隔测量或时间戳跟踪）来查看哪些位占用了大部分时间。所以。不是代码审查网站，您的问题并不具体。另外，它与python有什么关系？只是为了历史兴趣和性能比较而提到的吗？如果您想要 C++ 速度但仍需要 python 中的数据，您可以在共享库中公开 C++ 函数 - 请参阅 python 的 C/API。 calculate_amplitude 简直就是我的噩梦。 【参考方案1】：

自动将您的代码转换为 C++！

下面的 python 代码看起来像你的：

#pythran export loop(int, float, float, float, float, float, float, float, float)
from math import exp, sqrt, pi
import cmath
from numpy import empty

def loop(n,A, sigma, sin_phi, cos_phi,
                                 sin_theta, cos_theta, mu_x, mu_y):
    out = empty((n,n), dtype=complex)
    for x in range(n):
        for y in range(n):
            out[x,y] = calculate_amplitude(x,y,A, sigma, sin_phi, cos_phi,
                                           sin_theta, cos_theta, mu_x, mu_y)
    return out

def calculate_amplitude(x, y, A, sigma, sin_phi, cos_phi,
                        sin_theta, cos_theta, mu_x, mu_y):

    amplitude = (A * (1 / (sqrt(2  * pi * (sigma ** 2 / cos_theta ** 2)))) *
                 exp(-((cos_phi *(x - mu_x) + sin_phi *(y - mu_y)) ** 2 * cos_theta ** 2) /
                     (2 * sigma ** 2)) *
                 (1 / (sqrt(2 * pi * sigma ** 2))) *
                 exp(-((-sin_phi * (x - mu_x) + cos_phi * (y - mu_y)) ** 2) /
                     (2 * sigma ** 2)) *
                 cmath.exp(1j *(2 * pi / 0.650) * sin_theta *
                           (cos_phi * (x - mu_x) + sin_phi * (y - mu_y))))


    return amplitude

然后用pythran编译它：

$ pythran laser.py

并通过 timeit 运行它：

$ python -m timeit -s 'import laser' 'laser.loop(20,1,200,0.9,0.08,1,.5,.5,2)'
10000 loops, best of 3: 84.7 usec per loop

当原始代码运行时：

$ python -m timeit -s 'import laser' 'laser.loop(20,1,200,0.9,0.08,1,.5,.5,2)'
100 loops, best of 3: 2.65 msec per loop

使用numba 或cython 可能会获得类似的结果:-)

【讨论】：

感谢您的建议，太棒了！使用 numba 一次计算现在需要 0.19 秒，这非常接近 C++ 速度:) 我只需要将 calculate_amplitude 替换为 calculate_amplitude_jit = numba.jit("c8(f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,)")(calculate_amplitude)

以上是关于Python模拟显示性能缓慢，如何加快数组计算的主要内容，如果未能解决你的问题，请参考以下文章