pandas：根据相同数据框的日期时间索引查找添加新列

Posted 2023-03-11

技术标签:

【中文标题】pandas：根据相同数据框的日期时间索引查找添加新列【英文标题】：pandas: add new column based on datetime index lookup of same dataframe 【发布时间】：2022-01-22 13:47:14 【问题描述】：

我有以下数据，我想在其中添加一个新列，即当前月环比变化百分比。日期是我数据框中的索引

    date    close
1/26/1990   421.2999878
1/29/1990   418.1000061
1/30/1990   410.7000122
1/31/1990   415.7999878
2/23/1990   419.5
2/26/1990   421
2/27/1990   422.6000061
2/28/1990   425.7999878
3/26/1990   438.7999878
3/27/1990   439.5
3/28/1990   436.7000122
3/29/1990   435.3999939
3/30/1990   435.5

我能想到的最简单的方法是添加一个列，该列将包含上个月的结束日期以及为方便起见，上一个月末的“关闭”值 - 从中我可以计算当前月份 -月变化。所以最后，我会有一个如下所示的表格：

我能够很好地添加上个月末，但我现在在尝试根据上个月结束日期查找上个月末收盘时遇到问题。在下面的代码中，第一行可以正常添加上个月的结束日期作为新列。但第二个没有 - 想法是使用 prev_month_end 日期来查找月末收盘值并将其添加为列。

df['prev_month_end'] = df.index + pd.offsets.BMonthEnd(-1)
df['prev_month_close'] = df[df.index == df['prev_month_end']]

如能提供任何帮助或建议，我们将不胜感激。

【问题讨论】：

【参考方案1】：

你可以有prev_month_close如下：

df.reset_index(inplace=True)
df = df[['date', 'close', 'prev_month_end']].merge(df[['date', 'close']].rename(columns='close': 'prev_month_close',
                                                                                         'date': 'prev_month_end'),
                                                    how='left', on='prev_month_end')

OUTPUT

             date       close prev_month_end  prev_month_close
    0  1990-01-26  421.299988     1989-12-29               NaN
    1  1990-01-29  418.100006     1989-12-29               NaN
    2  1990-01-30  410.700012     1989-12-29               NaN
    3  1990-01-31  415.799988     1989-12-29               NaN
    4  1990-02-23  419.500000     1990-01-31        415.799988
    5  1990-02-26  421.000000     1990-01-31        415.799988
    6  1990-02-27  422.600006     1990-01-31        415.799988
    7  1990-02-28  425.799988     1990-01-31        415.799988
    8  1990-03-26  438.799988     1990-02-28        425.799988
    9  1990-03-27  439.500000     1990-02-28        425.799988
    10 1990-03-28  436.700012     1990-02-28        425.799988
    11 1990-03-29  435.399994     1990-02-28        425.799988
    12 1990-03-30  435.500000     1990-02-28        425.799988

或者不使用reset_index

df = df[['close', 'prev_month_end']].merge(df[['close']].rename(columns='close': 'prev_month_close'),
                                                    how='left', left_on='prev_month_end', right_index=True)

OUTPUT

                 close prev_month_end  prev_month_close
date                                                   
1990-01-26  421.299988     1989-12-29               NaN
1990-01-29  418.100006     1989-12-29               NaN
1990-01-30  410.700012     1989-12-29               NaN
1990-01-31  415.799988     1989-12-29               NaN
1990-02-23  419.500000     1990-01-31        415.799988
1990-02-26  421.000000     1990-01-31        415.799988
1990-02-27  422.600006     1990-01-31        415.799988
1990-02-28  425.799988     1990-01-31        415.799988
1990-03-26  438.799988     1990-02-28        425.799988
1990-03-27  439.500000     1990-02-28        425.799988
1990-03-28  436.700012     1990-02-28        425.799988
1990-03-29  435.399994     1990-02-28        425.799988
1990-03-30  435.500000     1990-02-28        425.799988

【讨论】：

所以在这种情况下，我们必须将它们视为两个不同的数据帧，然后合并 - 对吗？【参考方案2】：

我们可以将索引转换为period index，然后将group按周期转换为数据帧，并使用last聚合close，然后将shift一个月前的周期索引和map与收盘值合并，最后计算百分比变化

i = pd.to_datetime(df.index).to_period('M')
s = i.shift(-1).map(df.groupby(i)['close'].last())
df['mom_pct_change'] = df['close'].sub(s).div(s).mul(100)

                close  mom_pct_change
date                                 
1/26/1990  421.299988             NaN
1/29/1990  418.100006             NaN
1/30/1990  410.700012             NaN
1/31/1990  415.799988             NaN
2/23/1990  419.500000        0.889854
2/26/1990  421.000000        1.250604
2/27/1990  422.600006        1.635406
2/28/1990  425.799988        2.405002
3/26/1990  438.799988        3.053077
3/27/1990  439.500000        3.217476
3/28/1990  436.700012        2.559893
3/29/1990  435.399994        2.254581
3/30/1990  435.500000        2.278068

【讨论】：

shift(-1) 还是 shift(1)？应该是-1

以上是关于pandas：根据相同数据框的日期时间索引查找添加新列的主要内容，如果未能解决你的问题，请参考以下文章