如何查找自多少天以来列的值为True?

Posted

技术标签:

【中文标题】如何查找自多少天以来列的值为True?【英文标题】:How to find since how many days the value of a column is True? 【发布时间】:2021-04-06 19:25:59 【问题描述】:

考虑以下字典返回的以下DataFrame(代码如下):

我想在这个 DataFrame 中创建一个新列,它告诉我们从多少天以来这些值连续为真。 (对于每个股票,即 groupby(ticker))。

例如新列中应该存在的值在下面的代码中写为 cmets(对于前几行)。如果您对所需的输出有任何疑问,请发表评论:

'DaysWithGain': (Timestamp('2019-10-01 04:00:00+0000', tz='UTC'),
      'AAPL'): True,
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'AAPL'): True, #1
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'FSLY'): False, #0
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'LVGO'): False, #0
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'SHOP'): True, #1
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'UPLD'): False, #0
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'ZM'): True, #1
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'AAPL'): True, #1
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'FSLY'): False,#0
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'LVGO'): False, #0
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'SHOP'): True, #1
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'UPLD'): False, #0
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'ZM'): True, #1
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'AAPL'): True, #1
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'FSLY'): False, #0
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'LVGO'): False, #0
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'SHOP'): True, # 1
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'UPLD'): False, #0
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'ZM'): True,#1
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'AAPL'): True,#2
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'FSLY'): False,#0
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'LVGO'): False,#0
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'SHOP'): True,#2
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'UPLD'): False, #0
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'ZM'): True, #2
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'AAPL'): True, #2
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'FSLY'): False, #0
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'LVGO'): False,#0
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'SHOP'): True, #0
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'UPLD'): False, #0
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'ZM'): True,#2
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'AAPL'): True,#2
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'FSLY'): False,#0
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'LVGO'): False,#0
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'SHOP'): True,#2
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'UPLD'): False,#0
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'ZM'): True, # 2
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'AAPL'): True, #2
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'FSLY'): False,#0
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'LVGO'): False, #0
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'SHOP'): True, #2
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'UPLD'): False,#0
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'ZM'): True,#2
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'AAPL'): False,
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'AAPL'): False,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'AAPL'): False,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'AAPL'): False,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'ZM'): False,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'ZM'): False,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'UPLD'): True,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'ZM'): False,
     (Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'LVGO'): False

【问题讨论】:

【参考方案1】:

这需要自定义聚合函数。下面的函数接受一个类似列表的series True/False 值,并在末尾返回 True 的数量:

from functools import reduce
def num_last_true_vals(series):
    return reduce(lambda c,v: (c+1)*v, series, 0)

让我们检查一下。以下调用

(
num_last_true_vals([True,True,True,True]), 
num_last_true_vals([True,False,True,True]),
num_last_true_vals([True,False,False,True]),
num_last_true_vals([True,False,True,False]),
)

返回

(4, 2, 1, 0)

如预期的那样

现在转到您的数据集。从某种意义上说,对于每个股票代码,所有DaysWithGain 值要么全部为真,要么全部为假,这并不是很有趣。因此,我对其进行了一些修改,以确保解决方案按承诺工作。 dd 是您提供的字典,我们这样做

import pandas as pd
from pandas import Timestamp
df = pd.DataFrame(dd)
df.loc[(Timestamp('2019-10-03 10:00:00+00:00'),'AAPL'), 'DaysWithGain'] = False
df = df.sort_index(level=0)

请注意,我们将“AAPL”的条目之一设置为 False。另请注意,我们按时间戳排序以获得良好的衡量标准

现在进入主要动作,使用我们自定义的num_last_true_vals 函数:

df.groupby(level=1).agg('DaysWithGain':num_last_true_vals)

生产


        DaysWithGain
AAPL    8
FSLY    0
LVGO    0
SHOP    16
UPLD    0
ZM      16

DaysWithGain 列返回最后为 True 的时间戳数

在 cmets 后由 OP 编辑​​

您可以滚动使用相同的功能

所以以下应该工作。注意最后几位是为了漂亮的打印,你可以决定跳过它们

df2 = (df.groupby(level=1)
        .apply(lambda d: d.assign(ts_since_true = d['DaysWithGain']
            .rolling(window=1000, min_periods=1)
            .apply(num_last_true_vals)
        ))
        .reset_index()
        .sort_values(['level_1','level_0'])
    )

这会产生(使用您最近编辑的字典)

    level_0                    level_1    DaysWithGain      ts_since_true
--  -------------------------  ---------  --------------  ---------------
 0  2019-10-01 04:00:00+00:00  AAPL       True                          1
 1  2019-10-01 10:00:00+00:00  AAPL       True                          2
 7  2019-10-01 16:00:00+00:00  AAPL       True                          3
13  2019-10-01 22:00:00+00:00  AAPL       True                          4
19  2019-10-02 04:00:00+00:00  AAPL       True                          5
25  2019-10-02 10:00:00+00:00  AAPL       True                          6
31  2019-10-02 16:00:00+00:00  AAPL       True                          7
37  2019-10-02 22:00:00+00:00  AAPL       True                          8
43  2019-10-03 04:00:00+00:00  AAPL       False                         0
49  2019-10-03 10:00:00+00:00  AAPL       False                         0
55  2019-10-03 16:00:00+00:00  AAPL       False                         0
61  2019-10-03 22:00:00+00:00  AAPL       False                         0
67  2019-10-04 04:00:00+00:00  AAPL       True                          1
73  2019-10-04 10:00:00+00:00  AAPL       True                          2
79  2019-10-04 16:00:00+00:00  AAPL       True                          3
85  2019-10-04 22:00:00+00:00  AAPL       True                          4
91  2019-10-07 04:00:00+00:00  AAPL       True                          5
97  2019-10-07 10:00:00+00:00  AAPL       True                          6
 2  2019-10-01 10:00:00+00:00  FSLY       False                         0
 8  2019-10-01 16:00:00+00:00  FSLY       False                         0
14  2019-10-01 22:00:00+00:00  FSLY       False                         0
20  2019-10-02 04:00:00+00:00  FSLY       False                         0
26  2019-10-02 10:00:00+00:00  FSLY       False                         0
32  2019-10-02 16:00:00+00:00  FSLY       False                         0
38  2019-10-02 22:00:00+00:00  FSLY       False                         0
44  2019-10-03 04:00:00+00:00  FSLY       False                         0
50  2019-10-03 10:00:00+00:00  FSLY       False                         0
56  2019-10-03 16:00:00+00:00  FSLY       False                         0
62  2019-10-03 22:00:00+00:00  FSLY       False                         0
68  2019-10-04 04:00:00+00:00  FSLY       False                         0
74  2019-10-04 10:00:00+00:00  FSLY       False                         0
80  2019-10-04 16:00:00+00:00  FSLY       False                         0
86  2019-10-04 22:00:00+00:00  FSLY       False                         0
92  2019-10-07 04:00:00+00:00  FSLY       False                         0
98  2019-10-07 10:00:00+00:00  FSLY       False                         0
 3  2019-10-01 10:00:00+00:00  LVGO       False                         0
 9  2019-10-01 16:00:00+00:00  LVGO       False                         0
15  2019-10-01 22:00:00+00:00  LVGO       False                         0
21  2019-10-02 04:00:00+00:00  LVGO       False                         0
27  2019-10-02 10:00:00+00:00  LVGO       False                         0
33  2019-10-02 16:00:00+00:00  LVGO       False                         0
39  2019-10-02 22:00:00+00:00  LVGO       False                         0
45  2019-10-03 04:00:00+00:00  LVGO       False                         0
51  2019-10-03 10:00:00+00:00  LVGO       False                         0
57  2019-10-03 16:00:00+00:00  LVGO       False                         0
63  2019-10-03 22:00:00+00:00  LVGO       False                         0
69  2019-10-04 04:00:00+00:00  LVGO       False                         0
75  2019-10-04 10:00:00+00:00  LVGO       False                         0
81  2019-10-04 16:00:00+00:00  LVGO       False                         0
87  2019-10-04 22:00:00+00:00  LVGO       False                         0
93  2019-10-07 04:00:00+00:00  LVGO       False                         0
99  2019-10-07 10:00:00+00:00  LVGO       False                         0
 4  2019-10-01 10:00:00+00:00  SHOP       True                          1
10  2019-10-01 16:00:00+00:00  SHOP       True                          2
16  2019-10-01 22:00:00+00:00  SHOP       True                          3
22  2019-10-02 04:00:00+00:00  SHOP       True                          4
28  2019-10-02 10:00:00+00:00  SHOP       True                          5
34  2019-10-02 16:00:00+00:00  SHOP       True                          6
40  2019-10-02 22:00:00+00:00  SHOP       True                          7
46  2019-10-03 04:00:00+00:00  SHOP       True                          8
52  2019-10-03 10:00:00+00:00  SHOP       True                          9
58  2019-10-03 16:00:00+00:00  SHOP       True                         10
64  2019-10-03 22:00:00+00:00  SHOP       True                         11
70  2019-10-04 04:00:00+00:00  SHOP       True                         12
76  2019-10-04 10:00:00+00:00  SHOP       True                         13
82  2019-10-04 16:00:00+00:00  SHOP       True                         14
88  2019-10-04 22:00:00+00:00  SHOP       True                         15
94  2019-10-07 04:00:00+00:00  SHOP       True                         16
 5  2019-10-01 10:00:00+00:00  UPLD       False                         0
11  2019-10-01 16:00:00+00:00  UPLD       False                         0
17  2019-10-01 22:00:00+00:00  UPLD       False                         0
23  2019-10-02 04:00:00+00:00  UPLD       False                         0
29  2019-10-02 10:00:00+00:00  UPLD       False                         0
35  2019-10-02 16:00:00+00:00  UPLD       False                         0
41  2019-10-02 22:00:00+00:00  UPLD       False                         0
47  2019-10-03 04:00:00+00:00  UPLD       False                         0
53  2019-10-03 10:00:00+00:00  UPLD       False                         0
59  2019-10-03 16:00:00+00:00  UPLD       False                         0
65  2019-10-03 22:00:00+00:00  UPLD       False                         0
71  2019-10-04 04:00:00+00:00  UPLD       False                         0
77  2019-10-04 10:00:00+00:00  UPLD       False                         0
83  2019-10-04 16:00:00+00:00  UPLD       False                         0
89  2019-10-04 22:00:00+00:00  UPLD       False                         0
95  2019-10-07 04:00:00+00:00  UPLD       True                          1
 6  2019-10-01 10:00:00+00:00  ZM         True                          1
12  2019-10-01 16:00:00+00:00  ZM         True                          2
18  2019-10-01 22:00:00+00:00  ZM         True                          3
24  2019-10-02 04:00:00+00:00  ZM         True                          4
30  2019-10-02 10:00:00+00:00  ZM         True                          5
36  2019-10-02 16:00:00+00:00  ZM         True                          6
42  2019-10-02 22:00:00+00:00  ZM         True                          7
48  2019-10-03 04:00:00+00:00  ZM         True                          8
54  2019-10-03 10:00:00+00:00  ZM         True                          9
60  2019-10-03 16:00:00+00:00  ZM         True                         10
66  2019-10-03 22:00:00+00:00  ZM         True                         11
72  2019-10-04 04:00:00+00:00  ZM         True                         12
78  2019-10-04 10:00:00+00:00  ZM         True                         13
84  2019-10-04 16:00:00+00:00  ZM         False                         0
90  2019-10-04 22:00:00+00:00  ZM         False                         0
96  2019-10-07 04:00:00+00:00  ZM         False                         0

【讨论】:

感谢@piterberg 的回答。但是,要求是创建新列,以便在数据帧的每一行(按股票代码分组)中,我们看到自前几天以来,该值为 True(正如我在问题的示例输出中所展示的那样)。我认为这可以通过找出每行之前多少天出现 False 值来完成(我们可以对该值进行 +1 以获得所需的结果)。 (假设数据框按股票代码分组),你能为此提出一些建议吗? (你说得对,数据没那么有趣,所以我稍微调整一下数据框) 还有一点需要注意的是,同一股票的单个日期的值对于当天的所有时间都是相同的。情况将永远如此。 @UmangGarg 当然,请查看编辑。我把它留在时间戳的粒度上,以免弄乱。但是,如果给定日期的所有时间戳都具有相同的数据,则您应该使用 pd.Grouper 对您的 df 进行预处理,以便按照here 的解释以每日频率将它们折叠在一起,这样使用起来更容易。 非常感谢@piterbarg。我实际上是在对数据框进行特征工程,它需要保持这种方式,因为还有其他列。但这向我展示了做需要做的事情的方法。再次感谢,非常感谢您的帮助! :)

以上是关于如何查找自多少天以来列的值为True?的主要内容,如果未能解决你的问题,请参考以下文章

fastreport 如何统计某列的值为20的数据有多少行

linuxfind查找大于多少天的文件,并删除之

Teradata中“日期”数据类型列的最近30天

如何在所有表中查找特定列并在 PostgreSQL 中修改该列的值

查找与另一列的值相关的一个值

MySql中如何在一个字段(值为字符串)中查找某一个字符