当列值匹配时,Pandas Dataframe 从一行中替换 Nan
Posted
技术标签:
【中文标题】当列值匹配时,Pandas Dataframe 从一行中替换 Nan【英文标题】:Pandas Dataframe replace Nan from a row when a column value matches 【发布时间】:2019-08-18 19:02:08 【问题描述】:我有数据框,即,
Input Dataframe
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 Nan salem
3 III A Eng 80 gphss Nan
4 III A Mat 45 Nan salem
5 III A Eng 40 gphss Nan
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss Nan
当“class”和“section”列中的值匹配时,我需要替换“school”和“city”中的“Nan”。结果假设是, 输入数据框
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 jghss salem
3 III A Eng 80 gphss salem
4 III A Mat 45 gphss salem
5 III A Eng 40 gphss salem
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss salem
谁能帮我解决这个问题?
【问题讨论】:
【参考方案1】:使用lambda function
在列表中指定的列中使用DataFrame.groupby
正向和反向填充每个组的缺失值 - 对于每个组合,每个组都需要相同的值:
cols = ['school','city']
df[cols] = df.groupby(['class','section'])[cols].apply(lambda x: x.ffill().bfill())
print (df)
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 jghss salem
3 III A Eng 80 gphss salem
4 III A Mat 45 gphss salem
5 III A Eng 40 gphss salem
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss salem
【讨论】:
我已经尝试过你的建议,但我无法得到结果 @MahamuthaM - 不确定是否理解,这是创建DataFrame
的解决方案?还有什么问题?
@MahamuthaM - 你能解释更多吗?不换?在我的解决方案之前尝试使用df = df.replace(['Nan', 'NaN'], np.nan)
。【参考方案2】:
假设每对class
和section
对应一对唯一的school
和city
,我们可以使用groupby
:
# create a dictionary of class and section with school and city
# here we assume that for each pair and class there's a row with both school and city
# if that's not the case, we can separate the two series
school_city_dict = df[['class', 'section','school','city']].dropna().\
groupby(['class', 'section'])[['school','city']].\
max().to_dict()
# school_city_dict = 'school': ('I', 'A'): 'jghss', ('III', 'A'): 'gphss',
# 'city': ('I', 'A'): 'salem', ('III', 'A'): 'salem'
# set index, prepare for map function
df.set_index(['class','section'], inplace=True)
df.loc[:,'school'] = df.index.map(school_city_dict['school'])
df.loc[:,'city'] = df.index.map(school_city_dict['city'])
# reset index to the original
df.reset_index()
【讨论】:
AttributeError: 'list' 对象没有属性 'dropna'以上是关于当列值匹配时,Pandas Dataframe 从一行中替换 Nan的主要内容,如果未能解决你的问题,请参考以下文章
当列是现有列的布尔测试时,为啥向 Pandas DataFrame 添加列会返回 SettingWithCopy 警告?
如何将 3 个 Pandas 数据帧合并到第 4 个数据帧以匹配列值名称?
根据列值删除Python Pandas中的DataFrame行[重复]