在 numba 中使用 numpy.vstack

Posted

技术标签:

【中文标题】在 numba 中使用 numpy.vstack【英文标题】:Using numpy.vstack in numba 【发布时间】:2019-01-16 03:59:35 【问题描述】:

所以我一直在尝试优化一些从一些数组数据计算统计误差度量的代码。该指标称为连续等级概率得分(CRPS)。

我一直在使用 Numba 来尝试加快此计算中所需的双循环,但我遇到了 numpy.vstack 函数的问题。根据我从文档here 中了解到的情况,应该支持vstack() 函数,但是当我运行以下代码时出现错误。

def crps_hersbach_numba(obs, fcst_ens, remove_neg=False, remove_zero=False):
    """Calculate the the continuous ranked probability score (CRPS) as per equation 25-27 in
    Hersbach et al. (2000)

    Parameters
    ----------
    obs: 1D ndarry
        Array of observations for each start date
    fcst_ens: 2D ndarray
        Array of ensemble forecast of dimension n x M, where n = number of start dates and
        M = number of ensemble members.

    remove_neg: bool
        If True, when a negative value is found at the i-th position in the observed OR ensemble
        array, the i-th value of the observed AND ensemble array are removed before the
        computation.

    remove_zero: bool
        If true, when a zero value is found at the i-th position in the observed OR ensemble
        array, the i-th value of the observed AND ensemble array are removed before the
        computation.

    Returns
    -------
    dict
        Dictionary contains a number of *experimental* outputs including:
            - ["crps"] 1D ndarray of crps values per n start dates.
            - ["crpsMean1"] arithmetic mean of crps values.
            - ["crpsMean2"] mean crps using eqn. 28 in Hersbach (2000).

    Notes
    -----
    **NaN and inf treatment:** If any value in obs or fcst_ens is NaN or inf, then the
    corresponding row in both fcst_ens (for all ensemble members) and in obs will be deleted.

    References
    ----------
    - Hersbach, H. (2000) Decomposition of the Continuous Ranked Porbability Score
      for Ensemble Prediction Systems, Weather and Forecasting, 15, 559-570.
    """
    # Treating the Data
    obs, fcst_ens = treat_data(obs, fcst_ens, remove_neg=remove_neg, remove_zero=remove_zero)

    # Set parameters
    n = fcst_ens.shape[0]  # number of forecast start dates
    m = fcst_ens.shape[1]  # number of ensemble members

    # Create vector of pi's
    p = np.linspace(0, m, m + 1)
    pi = p / m

    crps_numba = np.zeros(n)

    @njit
    def calculate_crps():
        # Loop fcst start times
        for i in prange(n):

            # Initialise vectors for storing output
            a = np.zeros(m - 1)
            b = np.zeros(m - 1)

            # Verifying analysis (or obs)
            xa = obs[i]

            # Ensemble fcst CDF
            x = np.sort(fcst_ens[i, :])

            # Deal with 0 < i < m [So, will loop 50 times for m = 51]
            for j in prange(m - 1):

                # Rule 1
                if xa > x[j + 1]:
                    a[j] = x[j + 1] - x[j]
                    b[j] = 0

                # Rule 2
                if x[j] < xa < x[j + 1]:
                    a[j] = xa - x[j]
                    b[j] = x[j + 1] - xa

                # Rule 3
                if xa < x[j]:
                    a[j] = 0
                    b[j] = x[j + 1] - x[j]

            # Deal with outliers for i = 0, and i = m,
            # else a & b are 0 for non-outliers
            if xa < x[0]:
                a1 = 0
                b1 = x[0] - xa
            else:
                a1 = 0
                b1 = 0

            # Upper outlier (rem m-1 is for last member m, but python is 0-based indexing)
            if xa > x[m - 1]:
                am = xa - x[m - 1]
                bm = 0
            else:
                am = 0
                bm = 0

            # Combine full a & b vectors including outlier
            a = np.concatenate((np.array([0]), a, np.array([am])))
            # a = np.insert(a, 0, a1)
            # a = np.append(a, am)
            a = np.concatenate((np.array([0]), a, np.array([bm])))
            # b = np.insert(b, 0, b1)
            # b = np.append(b, bm)

            # Populate a_mat and b_mat
            if i == 0:
                a_mat = a
                b_mat = b
            else:
                a_mat = np.vstack((a_mat, a))
                b_mat = np.vstack((b_mat, b))

            # Calc crps for individual start times
            crps_numba[i] = ((a * pi ** 2) + (b * (1 - pi) ** 2)).sum()

        return crps_numba, a_mat, b_mat

    crps, a_mat, b_mat = calculate_crps()
    print(crps)
    # Calc mean crps as simple mean across crps[i]
    crps_mean_method1 = np.mean(crps)

    # Calc mean crps across all start times from eqn. 28 in Hersbach (2000)
    abar = np.mean(a_mat, 0)
    bbar = np.mean(b_mat, 0)
    crps_mean_method2 = ((abar * pi ** 2) + (bbar * (1 - pi) ** 2)).sum()

    # Output array as a dictionary
    output = 'crps': crps, 'crpsMean1': crps_mean_method1,
              'crpsMean2': crps_mean_method2

    return output

我得到的错误是这样的:

Cannot unify array(float64, 1d, C) and array(float64, 2d, C) for 'a_mat', defined at *path

File "test.py", line 86:
    def calculate_crps():
        <source elided>
            if i == 0:
                a_mat = a
                ^

[1] During: typing of assignment at *path

File "test.py", line 89:
    def calculate_crps():
        <source elided>
            else:
                a_mat = np.vstack((a_mat, a))
                ^

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

我只是想知道我哪里出错了。似乎vstack 函数应该可以工作,但也许我遗漏了一些东西。

【问题讨论】:

如果性能有任何问题,请避免使用 np.concatenate 和 np.vstack(预先分配数组并在之后填充它们)。即使对于非常小的 (n,m) 来说,大部分运行时都花费在不必要的内存分配和复制上。 (每个连接相当于分配一个新数组并将所有内容复制到其中) 【参考方案1】:

我只是想知道我哪里出错了。似乎vstack 函数应该可以工作,但也许我遗漏了一些东西。

TL;DR:问题不在于vstack。问题是您的代码路径试图将不同类型的数组分配给同一个变量(这会引发统一异常)。

问题出在这里:

# Populate a_mat and b_mat
if i == 0:
    a_mat = a
    b_mat = b
else:
    a_mat = np.vstack((a_mat, a))
    b_mat = np.vstack((b_mat, b))

在第一个代码路径中,您将 1d c 连续 float64 数组分配给 a_matb_mat,在 else 中,它是 2d c 连续 float64 数组。这些类型不兼容,因此 numba 会引发错误。 numba 代码不像 Python 代码那样工作有时会很棘手,当你为变量赋值时,你有什么类型并不重要。然而,在最近的版本中,numba 异常消息变得更好了,所以如果您知道异常提示什么,通常可以快速找出问题所在。

更长的解释

问题在于 numba 隐式推断变量的类型。例如:

from numba import njit

@njit
def func(arr):
    a = arr
    return a

这里我没有输入函数,所以我需要运行一次:

>>> import numpy as np
>>> func(np.zeros(5))
array([0., 0., 0., 0., 0.])

然后你可以检查类型:

>>> func.inspect_types()
func (array(float64, 1d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-4-02470248b065>
# --- LINE 3 --- 
# label 0

@njit

# --- LINE 4 --- 

def func(arr):

    # --- LINE 5 --- 
    #   arr = arg(0, name=arr)  :: array(float64, 1d, C)
    #   a = arr  :: array(float64, 1d, C)
    #   del arr

    a = arr

    # --- LINE 6 --- 
    #   $0.3 = cast(value=a)  :: array(float64, 1d, C)
    #   del a
    #   return $0.3

    return a

如您所见,变量a 是为array(float64, 1d, C) 类型的输入键入的array(float64, 1d, C)

现在,让我们改用np.vstack

from numba import njit
import numpy as np

@njit
def func(arr):
    a = np.vstack((arr, arr))
    return a

并且强制第一次调用来编译它:

>>> func(np.zeros(5))
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

然后再次检查类型:

func (array(float64, 1d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-11-f0214d5181c6>
# --- LINE 4 --- 
# label 0

@njit

# --- LINE 5 --- 

def func(arr):

    # --- LINE 6 --- 
    #   arr = arg(0, name=arr)  :: array(float64, 1d, C)
    #   $0.1 = global(np: <module 'numpy'>)  :: Module(<module 'numpy'>)
    #   $0.2 = getattr(value=$0.1, attr=vstack)  :: Function(<function vstack at 0x000001DB7082A400>)
    #   del $0.1
    #   $0.5 = build_tuple(items=[Var(arr, <ipython-input-11-f0214d5181c6> (6)), Var(arr, <ipython-input-11-f0214d5181c6> (6))])  :: tuple(array(float64, 1d, C) x 2)
    #   del arr
    #   $0.6 = call $0.2($0.5, func=$0.2, args=[Var($0.5, <ipython-input-11-f0214d5181c6> (6))], kws=(), vararg=None)  :: (tuple(array(float64, 1d, C) x 2),) -> array(float64, 2d, C)
    #   del $0.5
    #   del $0.2
    #   a = $0.6  :: array(float64, 2d, C)
    #   del $0.6

    a = np.vstack((arr, arr))

    # --- LINE 7 --- 
    #   $0.8 = cast(value=a)  :: array(float64, 2d, C)
    #   del a
    #   return $0.8

    return a

这次a 被输入为array(float64, 2d, C) 以输入array(float64, 1d, C)

您可能已经问过自己我为什么要谈论这个问题。让我们看看如果您尝试有条件地分配给a 会发生什么:

from numba import njit
import numpy as np

@njit
def func(arr, condition):
    if condition:
        a = np.vstack((arr, arr))
    else:
        a = arr
    return a

如果你现在运行代码:

>>> func(np.zeros(5), True)
TypingError: Failed at nopython (nopython frontend)
Cannot unify array(float64, 2d, C) and array(float64, 1d, C) for 'a', defined at <ipython-input-16-f4bd9a4f377a> (7)

File "<ipython-input-16-f4bd9a4f377a>", line 7:
def func(arr, condition):
    <source elided>
    if condition:
        a = np.vstack((arr, arr))
        ^

[1] During: typing of assignment at <ipython-input-16-f4bd9a4f377a> (9)

File "<ipython-input-16-f4bd9a4f377a>", line 9:
def func(arr, condition):
    <source elided>
    else:
        a = arr
        ^

这正是您遇到的问题,因为对于一组固定的输入类型,变量在 numba 中需要有一个且只有一个类型。而且因为 dtype、秩(维数)和连续属性都是类型的一部分,所以您不能将具有不同维数的数组分配给同一个变量。

请注意,您可以扩展尺寸以使其正常工作并从结果中再次压缩不必要的尺寸(可能不是很好,但它应该以最少的“更改”解决问题:

from numba import njit
import numpy as np

@njit
def func(arr, condition):
    if condition:
        a = np.vstack((arr, arr))
    else:
        a = np.expand_dims(arr, 0)
    return a

>>> func(np.zeros(5), False)
array([[0., 0., 0., 0., 0.]])  # <-- 2d array!
>>> func(np.zeros(5), True)
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

【讨论】:

很好的解释!有时我会忘记编译语言具有更严格的类型结构,这有助于我更好地理解它!

以上是关于在 numba 中使用 numpy.vstack的主要内容,如果未能解决你的问题,请参考以下文章

如何使 numba @jit 使用所有 cpu 内核(并行化 numba @jit)

numpy vstack 空初始化

数组的拼接

如何使用 numba 在 GPU 上泛化快速矩阵乘法

如何在 Numba 中使用指针包装 CFFI 函数

如何使用 python 和 numba 在 RTX GPU 中对 NVIDIA 的张量核心进行编程?