将每年的最后一周加到下一周
Posted
技术标签:
【中文标题】将每年的最后一周加到下一周【英文标题】:Add last week of every year to the next week 【发布时间】:2021-08-04 19:52:09 【问题描述】:如何删除每年的最后一周并将其添加到所有数字列的通用代码中的下一周?
df
date value
2019-12-20 0
2019-12-27 3
2020-01-03 7
...
2020-12-18 0
2020-12-25 4
2021-01-01 7
预期输出
date value
2019-12-20 0
2020-01-03 10
...
2020-12-18 0
2021-01-01 11
【问题讨论】:
为了透明度,您可以添加一个您计算的“财政年度”列 【参考方案1】:根据您的问题,我假设您的 DataFrame 每周只包含一行(看起来您这里只有星期五)。我还假设没有错过一周(即没有跳过星期五)并且它们是按时间排序的(如果没有,请先致电 df = df.sort_values("date")
就可以了)。
以下 sn-p 应该可以解决您的问题(解释在代码中):
import pandas as pd
df = pd.DataFrame(
"date": [
"2019-12-20", "2019-12-27",
"2020-01-03", "2020-12-18",
"2020-12-25", "2021-01-01"
],
"value": [0, 3, 7, 0, 4, 7],
)
numeric_columns = ["value"]
# Compute whether a row is the last week of a year
year = df["date"].str[:4]
is_last_week = year != year.shift(-1).fillna(year.iloc[-1])
print(is_last_week)
0 False
1 True
2 False
3 False
4 True
5 False
Name: date, dtype: bool
# Take the value from those rows
values_on_last_week = df[numeric_columns].where(is_last_week)
print(values_on_last_week)
value
0 NaN
1 3.0
2 NaN
3 NaN
4 4.0
5 NaN
# Shift values one row down
shifted_values_on_last_week = values_on_last_week.shift()
print(shifted_values_on_last_week)
value
0 NaN
1 NaN
2 3.0
3 NaN
4 NaN
5 4.0
# Put zeroes instead of NaNs
shifted_values_on_last_week = shifted_values_on_last_week.fillna(0)
print(shifted_values_on_last_week)
value
0 0.0
1 0.0
2 3.0
3 0.0
4 0.0
5 4.0
# Add this to df
df[numeric_columns] = df[numeric_columns] + shifted_values_on_last_week
print(df)
date value
0 2019-12-20 0.0
1 2019-12-27 3.0
2 2020-01-03 10.0
3 2020-12-18 0.0
4 2020-12-25 4.0
5 2021-01-01 11.0
# Drop the rows we don't want anymore
df = df[~is_last_week]
print(df)
date value
0 2019-12-20 0.0
2 2020-01-03 10.0
3 2020-12-18 0.0
5 2021-01-01 11.0
【讨论】:
【参考方案2】:另一种方法是查看数据集中每年的最小和最大日期。
data = ''' date value
2019-12-20 0
2019-12-27 3
2020-01-03 7
2020-12-18 0
2020-12-25 4
2021-01-01 7'''
df = pd.read_csv(io.StringIO(data), sep='\s+', engine='python')
df['date'] = pd.to_datetime(df['date'])
#get the max dates rows and min dates rows
dfmax = df[df['date'].dt.month==12].groupby(df['date'].dt.year).max().reset_index(drop=True)
dfmin = df[df['date'].dt.month==1].groupby(df['date'].dt.year).min().reset_index(drop=True)
# add the values
dfh = dfmin
dfh['value'] = dfmax['value'] + dfmin['value']
# remove unwanted rows from initial df
dfidx = dfh['date'].tolist()
df = df[~df['date'].isin(dfidx)].copy()
dfidx = dfmax['date'].tolist()
df = df[~df['date'].isin(dfidx)].copy()
# piece it back together with recalculated dates
dfnew = pd.concat([dfmin, df]).sort_values('date')
dfnew
输出
date value
0 2019-12-20 0
0 2020-01-03 10
3 2020-12-18 0
1 2021-01-01 11
【讨论】:
以上是关于将每年的最后一周加到下一周的主要内容,如果未能解决你的问题,请参考以下文章