内存泄漏(cython + numpy)
Posted
技术标签:
【中文标题】内存泄漏(cython + numpy)【英文标题】:memory leak (cython + numpy) 【发布时间】:2019-12-28 19:10:54 【问题描述】:我正在努力寻找这段代码中的泄漏点
kullback.pyx
import numpy as np
cimport numpy as np
from libcpp.vector cimport vector
import scipy.stats as st
import matplotlib.pyplot as plt
cdef vector[double] minmax(double i, dict a):
cdef double minmax
cdef vector[double] out
try:
minmax= min(list(filter(lambda x: x > i, a.keys())))
except ValueError:
minmax = min(a.keys())
cdef double maxmin
try:
maxmin = max(list(filter(lambda x: x < i, a.keys())))
except ValueError:
maxmin = max(a.keys())
out.push_back(minmax)
out.push_back(maxmin)
return out
def KullbackLeibler(args):
cdef np.ndarray[np.double_t, ndim = 1] psample = args[0]
cdef np.ndarray[np.double_t, ndim = 1] qsample = args[1]
cdef int n = args[2]
a = plt.hist(psample, bins = n)
cdef np.ndarray[np.double_t, ndim = 1] ax = a[1]
cdef np.ndarray[np.double_t, ndim = 1] ay = a[0]
b = plt.hist(qsample, bins = ax)
adict = dict(zip(ax, ay))
ax = ax[:-1]
cdef np.ndarray[np.double_t, ndim = 1] bx = b[1]
cdef np.ndarray[np.double_t, ndim = 1] by = b[0]
bdict = dict(zip(bx, by))
bx = bx[:-1]
cdef vector[double] kl
cdef int N = np.sum(ay)
cdef int i
cdef double p_minmax, p_maxmin, q_minmax, q_maxmin
cdef double KL
for i in range(len(psample)):
ptmp = minmax(psample[i], adict)
p_minmax = ptmp[0]
p_maxmin = ptmp[1]
qtmp = minmax(psample[i], bdict)
q_minmax = qtmp[0]
q_maxmin = qtmp[1]
pdensity = adict[p_maxmin]/ N
qdensity = np.max([bdict[q_maxmin]/ N, 10e-20])
KL = pdensity * np.log(pdensity/qdensity)
kl.push_back(KL)
cdef double res = np.sum(kl)
del args, psample, qsample, ax, ay, bx, by, adict, bdict
return res
这里是我启动的主要来源
main.py
import kullback as klcy #@unresolvedimport
import datetime
import numpy as np
import pathos.pools as pp
import objgraph
np.random.seed(10)
ncore = 4
pool = pp.ProcessPool(ncore)
KL = []
for i in range(2500):
time1 = datetime.datetime.now()
n = 500
x = [np.random.normal(size = n, scale = 1) for j in range(ncore)]
y = [np.random.normal(size = n, scale = 1) for j in range(ncore)]
data = np.array(list(zip(x,y,[n/10]*ncore)))
kl = pool.map(klcy.KullbackLeibler, data)
time2 = datetime.datetime.now()
print(i, time2 - time1, sep = " ")
print(objgraph.show_growth())
KL.append(kl)
函数KullbackLeibler
将两个数组和一个整数作为输入
我已经尝试过的:
使用 objgraph 来识别增长的对象,不幸的是,它似乎不适用于 C 定义的数组(它只识别我将结果附加到增长的列表) Why can't objgraph capture the growth of np.array()?
删除pyx函数末尾的所有数组
尝试在 pyx 文件和主文件中都调用 gc.collect()
,但没有任何改变
内存消耗随着迭代次数以及每次迭代所需的时间(从 0.6 秒到超过 4 秒)线性增长。这是我第一次尝试 cython,任何建议都会很有用。
【问题讨论】:
你应该提供一个minimal reproducible example(最小的非常重要) - 我会从删除plt.hist
-stuff 开始,并进一步减少示例。另见meta.***.com/q/388123/5769463
我会非常怀疑plt
。如果你不关闭情节,那么它可以永远保持打开状态。
@DavidW 你成功了。在@ead 评论之后,我正在删除那些东西......重新运行没有泄漏。最后输入plt.close()
后,泄漏就消失了。我在错误的地方寻找专注于数组的地方。谢谢。
【参考方案1】:
问题与数组无关。我没有关闭 matplotlib 图
a = plt.hist(psample, bins = n)
b = plt.hist(qsample, bins = ax)
即使我没有显示它们,它们仍然被绘制,消耗了之后从未释放的内存。感谢 cmets 中的 @DavidW 让我注意到。
【讨论】:
以上是关于内存泄漏(cython + numpy)的主要内容,如果未能解决你的问题,请参考以下文章