如何将 np.select() 的默认选择设为数组、系列或数据帧的先前值

Posted 2023-03-11

技术标签:

【中文标题】如何将 np.select() 的默认选择设为数组、系列或数据帧的先前值【英文标题】：How to make Default Choice for np.select() a Previous Value of an Array, Series, or DataFrame 【发布时间】：2020-09-11 09:39:52 【问题描述】：

我正在使用 np.select() 来构造一个值为 1、-1 或 0 的 ndarray，具体取决于某些条件。有可能这些都不会满足，所以我需要一个默认值。如果有意义的话，我希望这个值是数组在前一个索引中保存的值。我的幼稚代码在名为“total”的 DataFrame 的某些列上运行并引发错误，如下所示：

condlist = [total.ratios > total.s_entry, total.ratios < total.b_entry, (total.ratios > total.b_entry) & (total.ratios < total.s_entry)]
choicelist = (-1, 1, 0)
pos1 = pd.Series(np.select(condlist, choicelist, pos1))

有没有办法做我所要求的？例如，让数组开始

然后第六个元素不满足任何条件，所以它的值默认为-1，因为它是数组的最新值？

【问题讨论】：

np.select 遍历两个列表：zip(choicelist, condlist)，并分配 res[cond]=value。它首先将default 放入res，然后从末尾开始迭代，因此第一个条件具有优先权。它不会遍历数组的“行”。对于迭代系列元素的行为，select 不是您想要的工具。那么有没有另一种方法来做我所要求的，而不使用 for 循环？ 【参考方案1】：

我遇到了同样的问题，但不想通过一个复杂的机制来解决默认值的问题（假设我已经有一个使用 .loc 的工作版本），如这里的回复所示。

我只是尝试将数据框列/系列作为默认值传递以保留该值，当它已经在我的情况下填充并且它起作用时：

    # e.g. if task_type ~= nan then it already has a value of "C" 
    # that I want to keep

    conditions = [
        result_df["task_type"].isna() & result_df["maintenance_task"],
        result_df["task_type"].isna(),
    ]

    choices = ["A", "B"]

    result_df["task_type"] = np.select(conditions, choices, default=result_df["task_type"])

我注意到这种方法比我使用 .loc 的方法性能略高，如果出现更多条件，从长远来看，它会更好地扩展/读取。

【讨论】：

【参考方案2】：

我不确定您是否会对这个解决方案感到满意，但您可以分配一些默认值，然后在迭代时将其更改为您想要的：

x = np.arange(20)

condlist = [x < 4, np.logical_and(x > 8, x < 15), x > 15, True]
choicelist = (-1, 1, 0, None)
pos1 = pd.Series(np.select(condlist, choicelist, x))

for index, row in pos1.items():
    if row == None and index == 0:
        pass # Not sure what you want to do here
    elif row == None:
        pos1.at[index] = pos1.at[index-1]

【讨论】：

是否有矢量化的方式来做到这一点？最大的问题是当连续有两个None值的时候，不可能一次操作就做向量化，但也许你有信心不会出现这种情况？跨度> 在numpy 'vectorize' 意味着使用对整个数组进行操作的编译方法。它们确实会迭代，但在低级别，您无法控制。这是一个串行操作，一次操作一行，所以不能“向量化”。【参考方案3】：

尝试将None 保留为np.select 中的默认值

然后您可以使用.fillna() 方法填充它们，该方法接受pd.Series 作为索引填充的参数。

在您的情况下，参数与移位索引的系列相同（可以使用deque .rotate() 方法完成）。希望这对你有用：

from collections import deque

condlist = [total.ratios > total.s_entry, total.ratios < total.b_entry, (total.ratios > total.b_entry) & (total.ratios < total.s_entry)]
choicelist = (-1, 1, 0)

pos1 = pd.Series(np.select(condlist, choicelist, None))

pos1_index_shift = deque(pos1.index) # [0, 1, 2, ...]
pos1_index_shift.rotate(1) # [n, 0, 1, ...] - done inplace

pos1_prev = pos1.copy()
pos1_prev.index = pos1_index_shift

pos1 = pos1.fillna(pos1_prev)

【讨论】：

以上是关于如何将 np.select() 的默认选择设为数组、系列或数据帧的先前值的主要内容，如果未能解决你的问题，请参考以下文章