Python模拟显示性能缓慢,如何加快数组计算
Posted
技术标签:
【中文标题】Python模拟显示性能缓慢,如何加快数组计算【英文标题】:Python simulation shows slow performance, how to speed array calculation up 【发布时间】:2015-09-17 09:12:43 【问题描述】:我正在尝试编写一个简单的 Python 程序,该程序计算来自 2 个入射激光束的干涉图案。一切正常,但速度很慢。我使用的是 400x400 阵列,更改参数后重新计算强度大约需要 1.3 秒。但是,使用 C++ 运行代码大约需要 0.18 秒。所以我想知道我是否可以改进一些东西来加快速度?
到目前为止我的代码:
def calculate_intensity_array():
laser1 = zeros((400, 400), dtype=complex)
laser2 = zeros((400, 400), dtype=complex)
data = zeros((400, 400), dtype=complex)
onoff_1 = laser1_onoff_var.get()
A_1 = laser1_intensity_var.get()
sigma_1 = laser1_sigma_var.get()
sin_phi_1 = sin((laser1_phi_var.get() / 180) * pi)
cos_phi_1 = cos((laser1_phi_var.get() / 180) * pi)
sin_theta_1 = sin((laser1_theta_var.get() / 180) * pi)
cos_theta_1 = cos((laser1_theta_var.get() / 180) * pi)
mu_x_1 = laser1_xpos_var.get()
mu_y_1 = laser1_ypos_var.get()
onoff_2 = laser2_onoff_var.get()
A_2 = laser2_intensity_var.get()
sigma_2 = laser2_sigma_var.get()
sin_phi_2 = sin((laser2_phi_var.get() / 180) * pi)
sin_theta_2 = sin((laser2_theta_var.get() / 180) * pi)
cos_phi_2 = cos((laser2_phi_var.get() / 180) * pi)
cos_theta_2 = cos((laser2_theta_var.get() / 180) * pi)
mu_x_2 = laser2_xpos_var.get()
mu_y_2 = laser2_ypos_var.get()
if onoff_1 == 0:
laser1 = zeros((400, 400), dtype=complex)
elif onoff_1 == 1:
for i in range(400):
for k in range(400):
laser1[i][k] = calculate_amplitude(
(k - 200) * 10,
(i - 200) * 10,
A_1,
sigma_1,
sin_phi_1,
cos_phi_1,
sin_theta_1,
cos_theta_1,
mu_x_1,
mu_y_1)
if onoff_2 == 0:
laser2 = zeros((400, 400), dtype=complex)
elif onoff_2 == 1:
for i in range(400):
for k in range(400):
laser2[i][k] = calculate_amplitude(
(k - 200) * 10,
(i - 200) * 10,
A_2,
sigma_2,
sin_phi_2,
cos_phi_2,
sin_theta_2,
cos_theta_2,
mu_x_2,
mu_y_2)
data = abs(laser1 + laser2) ** 2
return data
def calculate_amplitude(x, y, A, sigma, sin_phi, cos_phi,
sin_theta, cos_theta, mu_x, mu_y):
amplitude = A * (1 / (sqrt(2 * pi * (sigma ** 2 / cos_theta ** 2)))) *
exp(-((cos_phi *(x - mu_x) + sin_phi *(y - mu_y)) ** 2 * cos_theta ** 2) /
(2 * sigma ** 2)) *
(1 / (sqrt(2 * pi * sigma ** 2))) *
exp(-((-sin_phi * (x - mu_x) + cos_phi * (y - mu_y)) ** 2) /
(2 * sigma ** 2)) *
cmath.exp(1j *(2 * pi / 0.650) * sin_theta *
(cos_phi * (x - mu_x) + sin_phi * (y - mu_y)))
return amplitude
start = time.clock()
draw_data = calculate_intensity_array()
print time.clock()-start
也许有什么引起你注意的事情应该做不同的?主要计算发生在 calculate_amplitude,但我尝试只输入 sin、cos 值,这样就不必每次都重新计算它们。
等效的 C++ 如下所示:
void calculate_intensity_array()
double intensity_array[400][400];
static complex<double> laser1[400][400];
static complex<double> laser2[400][400];
double A1 = 1;
double sigma1 = 2000;
double cos_theta1 = 0.9999619;
double sin_theta1 = 0.00872654;
double cos_phi1 = 1;
double sin_phi1 = 0;
double mu_x1 = 0.0;
double mu_y1 = 0.0;
double A2 = 1;
double sigma2 = 2000;
double cos_theta2 = 0.9999619;
double sin_theta2 = 0.00872654;
double cos_phi2 = 1;
double sin_phi2 = 0;
double mu_x2 = 0.0;
double mu_y2 = 0.0;
for (int i=0; i<400; i++)
for (int j=0; j<400; j++)
laser1[i][j] = calculate_amplitude((i-200)*10, (j-200)*10, A1,
sigma1, sin_phi1, cos_phi1,
sin_theta1, cos_theta1,
mu_x1, mu_y1);
laser2[i][j]=calculate_amplitude((i-200)*10, (j-200)*10, A2,
sigma2, sin_phi2, cos_phi2,
sin_theta2, cos_theta2,
mu_x2, mu_y2);
intensity_array[i][j] = pow(abs(laser1[i][j] + laser2[i][j]), 2);
complex<double> calculate_amplitude(double x, double y, double A,
double sigma, double sin_phi,
double cos_phi, double sin_theta,
double cos_theta, double mu_x,
double mu_y)
complex<double> output;
output = A * (1 / (sqrt(2 * M_PI * pow(sigma / cos_theta, 2)))) *
exp(-(pow(cos_phi * (x - 200 - mu_x) + sin_phi * (y - 200 - mu_y), 2) *
pow(cos_theta, 2)) / (2 * pow(sigma, 2))) *
(1 / (sqrt(2 * M_PI * pow(sigma, 2)))) *
exp(-(pow(-sin_phi * (x - 200 - mu_x) + cos_phi *
(y - 200 - mu_y), 2)) / (2 * pow(sigma, 2))) *
exp(complex<double>(0, (2 * M_PI / 0.650) * sin_theta *
(cos_phi * (x - 200 - mu_x) + sin_phi * (y - 200 - mu_y))));
return output;
【问题讨论】:
首先想到:C++ 可能内联函数调用,Python 做不到。 使用numpy
lib 和pypy
...
我认为人族的建议是有效的。我还建议使用 Matlab
来处理这类事情,而不是 python。
如果您想提高性能,开始使用分析器或仔细放置的工具(间隔测量或时间戳跟踪)来查看哪些位占用了大部分时间。所以。不是代码审查网站,您的问题并不具体。另外,它与python有什么关系?只是为了历史兴趣和性能比较而提到的吗?如果您想要 C++ 速度但仍需要 python 中的数据,您可以在共享库中公开 C++ 函数 - 请参阅 python 的 C/API。
calculate_amplitude
简直就是我的噩梦。
【参考方案1】:
自动将您的代码转换为 C++!
下面的 python 代码看起来像你的:
#pythran export loop(int, float, float, float, float, float, float, float, float)
from math import exp, sqrt, pi
import cmath
from numpy import empty
def loop(n,A, sigma, sin_phi, cos_phi,
sin_theta, cos_theta, mu_x, mu_y):
out = empty((n,n), dtype=complex)
for x in range(n):
for y in range(n):
out[x,y] = calculate_amplitude(x,y,A, sigma, sin_phi, cos_phi,
sin_theta, cos_theta, mu_x, mu_y)
return out
def calculate_amplitude(x, y, A, sigma, sin_phi, cos_phi,
sin_theta, cos_theta, mu_x, mu_y):
amplitude = (A * (1 / (sqrt(2 * pi * (sigma ** 2 / cos_theta ** 2)))) *
exp(-((cos_phi *(x - mu_x) + sin_phi *(y - mu_y)) ** 2 * cos_theta ** 2) /
(2 * sigma ** 2)) *
(1 / (sqrt(2 * pi * sigma ** 2))) *
exp(-((-sin_phi * (x - mu_x) + cos_phi * (y - mu_y)) ** 2) /
(2 * sigma ** 2)) *
cmath.exp(1j *(2 * pi / 0.650) * sin_theta *
(cos_phi * (x - mu_x) + sin_phi * (y - mu_y))))
return amplitude
然后用pythran编译它:
$ pythran laser.py
并通过 timeit 运行它:
$ python -m timeit -s 'import laser' 'laser.loop(20,1,200,0.9,0.08,1,.5,.5,2)'
10000 loops, best of 3: 84.7 usec per loop
当原始代码运行时:
$ python -m timeit -s 'import laser' 'laser.loop(20,1,200,0.9,0.08,1,.5,.5,2)'
100 loops, best of 3: 2.65 msec per loop
使用numba 或cython 可能会获得类似的结果:-)
【讨论】:
感谢您的建议,太棒了!使用 numba 一次计算现在需要 0.19 秒,这非常接近 C++ 速度:) 我只需要将 calculate_amplitude 替换为calculate_amplitude_jit = numba.jit("c8(f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,)")(calculate_amplitude)
以上是关于Python模拟显示性能缓慢,如何加快数组计算的主要内容,如果未能解决你的问题,请参考以下文章