如何向量化包含 if 语句的函数？

Posted 2023-02-23

技术标签:

【中文标题】如何向量化包含 if 语句的函数？【英文标题】：How to vectorize a function which contains an if statement? 【发布时间】：2014-08-30 01:16:55 【问题描述】：

假设我们有以下函数：

def f(x, y):
    if y == 0:
        return 0
    return x/y

这适用于标量值。不幸的是，当我尝试对x 和y 使用numpy 数组时，比较y == 0 被视为导致错误的数组操作：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-9884e2c3d1cd> in <module>()
----> 1 f(np.arange(1,10), np.arange(10,20))

<ipython-input-10-fbd24f17ea07> in f(x, y)
      1 def f(x, y):
----> 2     if y == 0:
      3         return 0
      4     return x/y

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

~~我尝试使用np.vectorize，但它没有任何区别，代码仍然失败并出现同样的错误。~~np.vectorize 是一个选项，它给出了我期望的结果。 p>

我能想到的唯一解决方案是在y 数组上使用np.where，例如：

def f(x, y):
    np.where(y == 0, 0, x/y)

这不适用于标量。

有没有更好的方法来编写一个包含 if 语句的函数？它应该适用于标量和数组。

【问题讨论】：

您是说您想为y 传递一个numpy 数组，但为x 传递一个数字？反之亦然，或两者兼而有之？如果您将 y 和 x 包装在 np.asarray 中，where 版本将起作用。但请注意，x/y 在任何地方都会被评估，因此如果有任何y==0，您可能会收到警告或异常（取决于您的浮点标志）。 @BrenBarn x 和 y 在第二种情况下都是数组。编辑了我的答案以使其更加明确。 np.vectorize 在这里工作正常:) @moarningsun 你能用代码发布答案吗？ 【参考方案1】：

一种方法是将x 和y 转换为函数内的numpy 数组：

def f(x, y):
    x = np.array(x)
    y = np.array(y)
    return np.where(y == 0, 0, x/y)

当x 或y 之一是标量而另一个是numpy 数组时，这将起作用。如果它们都是可以广播的数组，它也将起作用。如果它们是不兼容形状的数组（例如，不同长度的一维数组），它将无法工作，但无论如何都不清楚在这种情况下所需的行为是什么。

【讨论】：

将np.array(x) 更改为np.asarray(x)（y 也是如此），您就搞定了。在返回之前添加with np.errstate(divide='ignore'):（并缩进返回），以使警告静音。【参考方案2】：

我想知道np.vectorize 遇到的问题是什么。它在我的系统上运行良好：

In [145]: def f(x, y):
     ...:     if y == 0:
     ...:         return 0
     ...:     return x/y

In [146]: vf = np.vectorize(f)

In [147]: vf([[3],[10]], [0,1,2,0])
Out[147]: 
array([[ 0,  3,  1,  0],
       [ 0, 10,  5,  0]])

请注意，结果dtype 由第一个元素的结果确定。您也可以自己设置所需的输出：

In [148]: vf = np.vectorize(f, otypes=[np.float])

In [149]: vf([[3],[10]], [0,1,2,0])
Out[149]: 
array([[  0. ,   3. ,   1.5,   0. ],
       [  0. ,  10. ,   5. ,   0. ]])

docs中有更多例子。

【讨论】：

otypes=[np.float] 是我丢失的部分。【参考方案3】：

您可以使用掩码数组，该数组仅在 y!=0: 处执行除法：

def f(x, y):
    x = np.atleast_1d(np.array(x))
    y = np.atleast_1d(np.ma.array(y, mask=(y==0)))
    ans = x/y
    ans[ans.mask]=0
    return np.asarray(ans)

【讨论】：

假设你的意思是最后一行返回x/y，这会将y==0的所有值设置为1。 @PokeyMcPokerson 谢谢你的评论......我后来匆忙写了它，现在我已经修好了如果你屏蔽x数组而不是y，即x = np.ma.array(x, mask=(y==0))和y = np.array(y)，它的运行速度大约是两倍。它也摆脱了警告。【参考方案4】：

一种笨拙但有效的方法是基本上对数据进行预处理：

def f(x, y):
    if type(x) == int and type(y) == int: return x/y # Will it ever be used for this?

    # Change scalars to arrays
    if type(x) == int: x = np.full(y.shape, x, dtype=y.dtype)
    if type(y) == int: y = np.full(x.shape, y, dtype=x.dtype)

    # Change all divide by zero operations to 0/1
    div_zero_idx = (y==0)
    x[div_zero_idx] = 0
    y[div_zero_idx] = 1

    return x/y

我为所有不同的方法计时：

def f_mask(x, y):
    x = np.ma.array(x, mask=(y==0))
    y = np.array(y)
    ans = x/y
    ans[ans.mask]=0
    return np.asarray(ans)

def f_where(x, y):
    x = np.array(x)
    y = np.array(y)
    return np.where(y == 0, 0, x/y)

def f_vect(x, y):
    if y == 0:
        return 0
    return x/y

vf = np.vectorize(f_vect)

print timeit.timeit('f(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f; import numpy as np; array_length=1000")
print timeit.timeit('f_mask(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f_mask; import numpy as np; array_length=1000")
print timeit.timeit('f_where(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f_where; import numpy as np; array_length=1000")
print timeit.timeit('vf(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import vf; import numpy as np; array_length=(1000)")

# f
# 0.760189056396

# f_mask
# 2.24414896965

# f_where
# RuntimeWarning: divide by zero encountered in divide return np.where(y == 0, 0, x/y)
# 1.08176398277

# f_vect
# 3.45374488831

第一个函数是最快的，并且没有警告。如果 x 或 y 是标量，则时间比率相似。对于更高维的数组，掩码数组方法相对更快（尽管它仍然是最慢的）。

【讨论】：

【参考方案5】：

假设您有一个预测的向量/np 数组： [0,1,0,1,1,0] 并且您想将其转换为序列 ['N', 'Y', 'N', 'Y', 'Y', 'N']

import numpy as np

y_pred = np.array([0,1,0,1,1,0])

def toYN(x):
    if x > 0:
        return "Y"
    else:
        return "N"

vf_YN = np.vectorize(toYN)
Loan_Status = vf_YN(y_pred)

Loan_Status 将包含 ['N', 'Y', 'N', 'Y', 'Y', 'N']

【讨论】：

以上是关于如何向量化包含 if 语句的函数？的主要内容，如果未能解决你的问题，请参考以下文章