填写一列日期值,直到达到另一个日期值,然后继续填充新达到的值
Posted
技术标签:
【中文标题】填写一列日期值,直到达到另一个日期值,然后继续填充新达到的值【英文标题】:Fill down a column date value until another date value is reached, then continue filling with the newly reached value 【发布时间】:2018-06-10 10:28:35 【问题描述】:我有以下数据框:
Date Team 1 Team 2 Score1 Score2
0 1-Oct-17 1 NaN 2 NaN
1 21:20 Chicago Cubs Cincinnati Reds 1 3.0
2 21:15 Kansas City Royals Arizona Diamondbacks 2 14.0
3 21:15 St.Louis Cardinals Milwaukee Brewers 1 6.0
4 30-Sep-17 1 NaN 2 NaN
5 22:15 St.Louis Cardinals Milwaukee Brewers 7 6.0
6 22:05 Chicago Cubs Cincinnati Reds 9 0.0
7 22:05 San Francisco Giants San Diego Padres 2 3.0
8 19:05 Boston Red Sox Houston Astros 6 3.0
9 29-Sep-17 1 NaN 2 NaN
10 20:20 Chicago Cubs Cincinnati Reds 5 4.0
11 19:05 New York Yankees Toronto Blue Jays 4 0.0
12 2:15 Kansas City Royals Detroit Tigers 1 4.0
13 2:10 Chicago White Sox Los Angeles Angels 5 4.0
为了得到这个结果,我需要填写日期值并替换时间值。
Date Team 1 Team 2 Score1 Score2
0 1-Oct-17 1 NaN 2 NaN
1 1-Oct-17 Chicago Cubs Cincinnati Reds 1 3.0
2 1-Oct-17 Kansas City Royals Arizona Diamondbacks 2 14.0
3 1-Oct-17 St.Louis Cardinals Milwaukee Brewers 1 6.0
4 30-Sep-17 1 NaN 2 NaN
5 30-Sep-17 St.Louis Cardinals Milwaukee Brewers 7 6.0
6 30-Sep-17 Chicago Cubs Cincinnati Reds 9 0.0
7 30-Sep-17 San Francisco Giants San Diego Padres 2 3.0
8 30-Sep-17 Boston Red Sox Houston Astros 6 3.0
9 29-Sep-17 1 NaN 2 NaN
10 29-Sep-17 Chicago Cubs Cincinnati Reds 5 4.0
11 29-Sep-17 New York Yankees Toronto Blue Jays 4 0.0
12 29-Sep-17 Kansas City Royals Detroit Tigers 1 4.0
13 29-Sep-17 Chicago White Sox Los Angeles Angels 5 4.0
【问题讨论】:
【参考方案1】:您可以检查Date
列中值的长度,如果高于7
,则将NaN
替换为where
,最后通过ffill
前向填充缺失值(fillna
使用方法ffill
) :
df['Date'] = df['Date'].where(df['Date'].str.len() > 7).ffill()
#similar idea
#df['Date'] = df['Date'].mask(df['Date'].str.len().isin([4,5])).ffill()
print (df)
Date Team 1 Team 2 Score1 Score2
0 1-Oct-17 1 NaN 2 NaN
1 1-Oct-17 Chicago Cubs Cincinnati Reds 1 3.0
2 1-Oct-17 Kansas City Royals Arizona Diamondbacks 2 14.0
3 1-Oct-17 St.Louis Cardinals Milwaukee Brewers 1 6.0
4 30-Sep-17 1 NaN 2 NaN
5 30-Sep-17 St.Louis Cardinals Milwaukee Brewers 7 6.0
6 30-Sep-17 Chicago Cubs Cincinnati Reds 9 0.0
7 30-Sep-17 San Francisco Giants San Diego Padres 2 3.0
8 30-Sep-17 Boston Red Sox Houston Astros 6 3.0
9 29-Sep-17 1 NaN 2 NaN
10 29-Sep-17 Chicago Cubs Cincinnati Reds 5 4.0
11 29-Sep-17 New York Yankees Toronto Blue Jays 4 0.0
12 29-Sep-17 Kansas City Royals Detroit Tigers 1 4.0
13 29-Sep-17 Chicago White Sox Los Angeles Angels 5 4.0
另一个想法是将值转换为日期时间并比较 0:00
时间:
from datetime import time
df['Date'] = pd.to_datetime(df['Date'] )
df['Date'] = df['Date'].where(df['Date'].dt.time == time(0,0)).ffill()
print (df)
Date Team 1 Team 2 Score1 Score2
0 2017-10-01 1 NaN 2 NaN
1 2017-10-01 Chicago Cubs Cincinnati Reds 1 3.0
2 2017-10-01 Kansas City Royals Arizona Diamondbacks 2 14.0
3 2017-10-01 St.Louis Cardinals Milwaukee Brewers 1 6.0
4 2017-09-30 1 NaN 2 NaN
5 2017-09-30 St.Louis Cardinals Milwaukee Brewers 7 6.0
6 2017-09-30 Chicago Cubs Cincinnati Reds 9 0.0
7 2017-09-30 San Francisco Giants San Diego Padres 2 3.0
8 2017-09-30 Boston Red Sox Houston Astros 6 3.0
9 2017-09-29 1 NaN 2 NaN
10 2017-09-29 Chicago Cubs Cincinnati Reds 5 4.0
11 2017-09-29 New York Yankees Toronto Blue Jays 4 0.0
12 2017-09-29 Kansas City Royals Detroit Tigers 1 4.0
13 2017-09-29 Chicago White Sox Los Angeles Angels 5 4.0
【讨论】:
以上是关于填写一列日期值,直到达到另一个日期值,然后继续填充新达到的值的主要内容,如果未能解决你的问题,请参考以下文章