如何查找自多少天以来列的值为True?
Posted
技术标签:
【中文标题】如何查找自多少天以来列的值为True?【英文标题】:How to find since how many days the value of a column is True? 【发布时间】:2021-04-06 19:25:59 【问题描述】:考虑以下字典返回的以下DataFrame(代码如下):
我想在这个 DataFrame 中创建一个新列,它告诉我们从多少天以来这些值连续为真。 (对于每个股票,即 groupby(ticker))。
例如新列中应该存在的值在下面的代码中写为 cmets(对于前几行)。如果您对所需的输出有任何疑问,请发表评论:
'DaysWithGain': (Timestamp('2019-10-01 04:00:00+0000', tz='UTC'),
'AAPL'): True,
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'AAPL'): True, #1
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'FSLY'): False, #0
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'LVGO'): False, #0
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'SHOP'): True, #1
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'UPLD'): False, #0
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'ZM'): True, #1
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'AAPL'): True, #1
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'FSLY'): False,#0
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'LVGO'): False, #0
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'SHOP'): True, #1
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'UPLD'): False, #0
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'ZM'): True, #1
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'AAPL'): True, #1
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'FSLY'): False, #0
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'LVGO'): False, #0
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'SHOP'): True, # 1
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'UPLD'): False, #0
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'ZM'): True,#1
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'AAPL'): True,#2
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'FSLY'): False,#0
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'LVGO'): False,#0
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'SHOP'): True,#2
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'UPLD'): False, #0
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'ZM'): True, #2
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'AAPL'): True, #2
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'FSLY'): False, #0
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'LVGO'): False,#0
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'SHOP'): True, #0
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'UPLD'): False, #0
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'ZM'): True,#2
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'AAPL'): True,#2
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'FSLY'): False,#0
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'LVGO'): False,#0
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'SHOP'): True,#2
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'UPLD'): False,#0
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'ZM'): True, # 2
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'AAPL'): True, #2
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'FSLY'): False,#0
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'LVGO'): False, #0
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'SHOP'): True, #2
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'UPLD'): False,#0
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'ZM'): True,#2
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'AAPL'): False,
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'AAPL'): False,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'AAPL'): False,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'AAPL'): False,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'ZM'): False,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'ZM'): False,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'UPLD'): True,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'ZM'): False,
(Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'LVGO'): False
【问题讨论】:
【参考方案1】:这需要自定义聚合函数。下面的函数接受一个类似列表的series
True/False 值,并在末尾返回 True 的数量:
from functools import reduce
def num_last_true_vals(series):
return reduce(lambda c,v: (c+1)*v, series, 0)
让我们检查一下。以下调用
(
num_last_true_vals([True,True,True,True]),
num_last_true_vals([True,False,True,True]),
num_last_true_vals([True,False,False,True]),
num_last_true_vals([True,False,True,False]),
)
返回
(4, 2, 1, 0)
如预期的那样
现在转到您的数据集。从某种意义上说,对于每个股票代码,所有DaysWithGain
值要么全部为真,要么全部为假,这并不是很有趣。因此,我对其进行了一些修改,以确保解决方案按承诺工作。 dd
是您提供的字典,我们这样做
import pandas as pd
from pandas import Timestamp
df = pd.DataFrame(dd)
df.loc[(Timestamp('2019-10-03 10:00:00+00:00'),'AAPL'), 'DaysWithGain'] = False
df = df.sort_index(level=0)
请注意,我们将“AAPL”的条目之一设置为 False。另请注意,我们按时间戳排序以获得良好的衡量标准
现在进入主要动作,使用我们自定义的num_last_true_vals
函数:
df.groupby(level=1).agg('DaysWithGain':num_last_true_vals)
生产
DaysWithGain
AAPL 8
FSLY 0
LVGO 0
SHOP 16
UPLD 0
ZM 16
DaysWithGain
列返回最后为 True 的时间戳数
在 cmets 后由 OP 编辑
您可以滚动使用相同的功能
所以以下应该工作。注意最后几位是为了漂亮的打印,你可以决定跳过它们
df2 = (df.groupby(level=1)
.apply(lambda d: d.assign(ts_since_true = d['DaysWithGain']
.rolling(window=1000, min_periods=1)
.apply(num_last_true_vals)
))
.reset_index()
.sort_values(['level_1','level_0'])
)
这会产生(使用您最近编辑的字典)
level_0 level_1 DaysWithGain ts_since_true
-- ------------------------- --------- -------------- ---------------
0 2019-10-01 04:00:00+00:00 AAPL True 1
1 2019-10-01 10:00:00+00:00 AAPL True 2
7 2019-10-01 16:00:00+00:00 AAPL True 3
13 2019-10-01 22:00:00+00:00 AAPL True 4
19 2019-10-02 04:00:00+00:00 AAPL True 5
25 2019-10-02 10:00:00+00:00 AAPL True 6
31 2019-10-02 16:00:00+00:00 AAPL True 7
37 2019-10-02 22:00:00+00:00 AAPL True 8
43 2019-10-03 04:00:00+00:00 AAPL False 0
49 2019-10-03 10:00:00+00:00 AAPL False 0
55 2019-10-03 16:00:00+00:00 AAPL False 0
61 2019-10-03 22:00:00+00:00 AAPL False 0
67 2019-10-04 04:00:00+00:00 AAPL True 1
73 2019-10-04 10:00:00+00:00 AAPL True 2
79 2019-10-04 16:00:00+00:00 AAPL True 3
85 2019-10-04 22:00:00+00:00 AAPL True 4
91 2019-10-07 04:00:00+00:00 AAPL True 5
97 2019-10-07 10:00:00+00:00 AAPL True 6
2 2019-10-01 10:00:00+00:00 FSLY False 0
8 2019-10-01 16:00:00+00:00 FSLY False 0
14 2019-10-01 22:00:00+00:00 FSLY False 0
20 2019-10-02 04:00:00+00:00 FSLY False 0
26 2019-10-02 10:00:00+00:00 FSLY False 0
32 2019-10-02 16:00:00+00:00 FSLY False 0
38 2019-10-02 22:00:00+00:00 FSLY False 0
44 2019-10-03 04:00:00+00:00 FSLY False 0
50 2019-10-03 10:00:00+00:00 FSLY False 0
56 2019-10-03 16:00:00+00:00 FSLY False 0
62 2019-10-03 22:00:00+00:00 FSLY False 0
68 2019-10-04 04:00:00+00:00 FSLY False 0
74 2019-10-04 10:00:00+00:00 FSLY False 0
80 2019-10-04 16:00:00+00:00 FSLY False 0
86 2019-10-04 22:00:00+00:00 FSLY False 0
92 2019-10-07 04:00:00+00:00 FSLY False 0
98 2019-10-07 10:00:00+00:00 FSLY False 0
3 2019-10-01 10:00:00+00:00 LVGO False 0
9 2019-10-01 16:00:00+00:00 LVGO False 0
15 2019-10-01 22:00:00+00:00 LVGO False 0
21 2019-10-02 04:00:00+00:00 LVGO False 0
27 2019-10-02 10:00:00+00:00 LVGO False 0
33 2019-10-02 16:00:00+00:00 LVGO False 0
39 2019-10-02 22:00:00+00:00 LVGO False 0
45 2019-10-03 04:00:00+00:00 LVGO False 0
51 2019-10-03 10:00:00+00:00 LVGO False 0
57 2019-10-03 16:00:00+00:00 LVGO False 0
63 2019-10-03 22:00:00+00:00 LVGO False 0
69 2019-10-04 04:00:00+00:00 LVGO False 0
75 2019-10-04 10:00:00+00:00 LVGO False 0
81 2019-10-04 16:00:00+00:00 LVGO False 0
87 2019-10-04 22:00:00+00:00 LVGO False 0
93 2019-10-07 04:00:00+00:00 LVGO False 0
99 2019-10-07 10:00:00+00:00 LVGO False 0
4 2019-10-01 10:00:00+00:00 SHOP True 1
10 2019-10-01 16:00:00+00:00 SHOP True 2
16 2019-10-01 22:00:00+00:00 SHOP True 3
22 2019-10-02 04:00:00+00:00 SHOP True 4
28 2019-10-02 10:00:00+00:00 SHOP True 5
34 2019-10-02 16:00:00+00:00 SHOP True 6
40 2019-10-02 22:00:00+00:00 SHOP True 7
46 2019-10-03 04:00:00+00:00 SHOP True 8
52 2019-10-03 10:00:00+00:00 SHOP True 9
58 2019-10-03 16:00:00+00:00 SHOP True 10
64 2019-10-03 22:00:00+00:00 SHOP True 11
70 2019-10-04 04:00:00+00:00 SHOP True 12
76 2019-10-04 10:00:00+00:00 SHOP True 13
82 2019-10-04 16:00:00+00:00 SHOP True 14
88 2019-10-04 22:00:00+00:00 SHOP True 15
94 2019-10-07 04:00:00+00:00 SHOP True 16
5 2019-10-01 10:00:00+00:00 UPLD False 0
11 2019-10-01 16:00:00+00:00 UPLD False 0
17 2019-10-01 22:00:00+00:00 UPLD False 0
23 2019-10-02 04:00:00+00:00 UPLD False 0
29 2019-10-02 10:00:00+00:00 UPLD False 0
35 2019-10-02 16:00:00+00:00 UPLD False 0
41 2019-10-02 22:00:00+00:00 UPLD False 0
47 2019-10-03 04:00:00+00:00 UPLD False 0
53 2019-10-03 10:00:00+00:00 UPLD False 0
59 2019-10-03 16:00:00+00:00 UPLD False 0
65 2019-10-03 22:00:00+00:00 UPLD False 0
71 2019-10-04 04:00:00+00:00 UPLD False 0
77 2019-10-04 10:00:00+00:00 UPLD False 0
83 2019-10-04 16:00:00+00:00 UPLD False 0
89 2019-10-04 22:00:00+00:00 UPLD False 0
95 2019-10-07 04:00:00+00:00 UPLD True 1
6 2019-10-01 10:00:00+00:00 ZM True 1
12 2019-10-01 16:00:00+00:00 ZM True 2
18 2019-10-01 22:00:00+00:00 ZM True 3
24 2019-10-02 04:00:00+00:00 ZM True 4
30 2019-10-02 10:00:00+00:00 ZM True 5
36 2019-10-02 16:00:00+00:00 ZM True 6
42 2019-10-02 22:00:00+00:00 ZM True 7
48 2019-10-03 04:00:00+00:00 ZM True 8
54 2019-10-03 10:00:00+00:00 ZM True 9
60 2019-10-03 16:00:00+00:00 ZM True 10
66 2019-10-03 22:00:00+00:00 ZM True 11
72 2019-10-04 04:00:00+00:00 ZM True 12
78 2019-10-04 10:00:00+00:00 ZM True 13
84 2019-10-04 16:00:00+00:00 ZM False 0
90 2019-10-04 22:00:00+00:00 ZM False 0
96 2019-10-07 04:00:00+00:00 ZM False 0
【讨论】:
感谢@piterberg 的回答。但是,要求是创建新列,以便在数据帧的每一行(按股票代码分组)中,我们看到自前几天以来,该值为 True(正如我在问题的示例输出中所展示的那样)。我认为这可以通过找出每行之前多少天出现 False 值来完成(我们可以对该值进行 +1 以获得所需的结果)。 (假设数据框按股票代码分组),你能为此提出一些建议吗? (你说得对,数据没那么有趣,所以我稍微调整一下数据框) 还有一点需要注意的是,同一股票的单个日期的值对于当天的所有时间都是相同的。情况将永远如此。 @UmangGarg 当然,请查看编辑。我把它留在时间戳的粒度上,以免弄乱。但是,如果给定日期的所有时间戳都具有相同的数据,则您应该使用 pd.Grouper 对您的 df 进行预处理,以便按照here 的解释以每日频率将它们折叠在一起,这样使用起来更容易。 非常感谢@piterbarg。我实际上是在对数据框进行特征工程,它需要保持这种方式,因为还有其他列。但这向我展示了做需要做的事情的方法。再次感谢,非常感谢您的帮助! :)以上是关于如何查找自多少天以来列的值为True?的主要内容,如果未能解决你的问题,请参考以下文章