使用另一个数据框中的值和日期范围创建全新的数据框
Posted
技术标签:
【中文标题】使用另一个数据框中的值和日期范围创建全新的数据框【英文标题】:Create brand new dataframe using values and date ranges within another dataframe 【发布时间】:2021-10-26 22:24:15 【问题描述】:我有一个由经常性费用组成的数据框,包括开始日期 (purchase_date)、结束日期 (date_terminated)、ID 和价格:
id price purchase_date date_terminated
0 AA11 100 2019-03-29 NaT
1 AA12 10750.0 2020-02-28 NaT
2 AA13 2500.0 2020-06-01 NaT
3 BB11 600.0 2020-06-01 2021-08-01
4 BB12 600.0 2020-06-01 2021-06-17
5 BB13 6692.0 2020-07-08 2021-04-01
6 CC11 6692.0 2020-08-12 NaT
7 CC12 6692.0 2020-08-12 NaT
8 CC13 600.0 2020-09-01 2021-04-01
9 DD11 600.0 2020-09-01 NaT
如果 date_terminated==NaT,则表示费用仍在重复发生。
我还有一个日期列表,从我最早的经常性费用的开始日期开始,一直到用户选择的任何日期:
[datetime.datetime(2019, 3, 15, 0, 0),
datetime.datetime(2019, 4, 15, 0, 0),
.
.
.
.
.
datetime.datetime(2021, 6, 15, 0, 0),
datetime.datetime(2021, 7, 15, 0, 0),
datetime.datetime(2021, 8, 15, 0, 0)]
我想构建一个数据框,其中 index 作为 dates_list,列作为费用 ID,并使用 purchase_date 和 date_terminated 作为参考点在整个 df 中分配我的费用。
最终结果应该类似于以下内容:
AA11 AA12 AA13 BB11 BB12 BB13 CC11 CC12 CC13 DD11
2019-03-15 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2019-04-15 100 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2019-05-15 100 NaN NaN NaN NaN NaN NaN NaN NaN NaN
.
.
.
.
2021-06-15 100 10750 2500 600 600 NaN 6692 6692 NaN 600
2021-07-15 100 10750 2500 600 NaN NaN 6692 6692 NaN 600
2021-08-15 100 10750 2500 NaN NaN NaN 6692 6692 NaN 600
【问题讨论】:
【参考方案1】:试试:
import datetime
df["purchase_date"] = pd.to_datetime(df["purchase_date"])
df["date_terminated"] = pd.to_datetime(df["date_terminated"])
lst = [
datetime.datetime(2019, 3, 15, 0, 0),
datetime.datetime(2019, 4, 15, 0, 0),
datetime.datetime(2021, 6, 15, 0, 0),
datetime.datetime(2021, 7, 15, 0, 0),
datetime.datetime(2021, 8, 15, 0, 0),
]
df_tmp = pd.DataFrame("dates": lst)
mx = df_tmp.dates.max()
df["purchase_date"] = df.apply(
lambda x: pd.date_range(
x["purchase_date"],
x["date_terminated"] if pd.notna(x["date_terminated"]) else mx,
),
axis=1,
)
cols = df.id.unique()
df = df.explode("purchase_date")
df = df[df.purchase_date.isin(df_tmp.dates)]
print(
df.pivot(index="purchase_date", columns="id", values="price")
.reindex(df_tmp["dates"])
.reindex(cols, axis=1)
)
打印:
id AA11 AA12 AA13 BB11 BB12 BB13 CC11 CC12 CC13 DD11
dates
2019-03-15 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2019-04-15 100.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2021-06-15 100.0 10750.0 2500.0 600.0 600.0 NaN 6692.0 6692.0 NaN 600.0
2021-07-15 100.0 10750.0 2500.0 600.0 NaN NaN 6692.0 6692.0 NaN 600.0
2021-08-15 100.0 10750.0 2500.0 NaN NaN NaN 6692.0 6692.0 NaN 600.0
【讨论】:
以上是关于使用另一个数据框中的值和日期范围创建全新的数据框的主要内容,如果未能解决你的问题,请参考以下文章
如何根据一个数据帧中的列值和R中另一个数据帧的列标题名称有条件地创建新列