当DataFrame具有不同的值时,如何将重复行合并为一个
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了当DataFrame具有不同的值时,如何将重复行合并为一个相关的知识,希望对你有一定的参考价值。
我有一个DataFrame
如下:
ID NAME TEL_1 TEL_2 TEL_3
1 John 123456 754987 465317
1 John 465987 465987
1 John 546783
2 Robert 264687
2 Robert 462531
3 William 432645 765346 875137
我需要合并具有相同ID
的行,保存手机值,如下所示:
ID NAME TEL_1 TEL_2 TEL_3 TEL_4 TEL_5 TEL_6
1 John 123456 754987 465317 465987 465987 546783
2 Robert 264687 462531
3 William 432645 765346 875137
答案
您可以将ID
和NAME
columns
设置为index
,在这些上使用groupby
然后concat
水平相应的rows
以获得您想要的输出:
persons = df.set_index(['ID', 'NAME']).groupby(level=['ID', 'NAME'])
new_df =pd.DataFrame()
for details, phones in persons:
person_phones = pd.concat([row for i, row in phones.iterrows()]).to_frame()
person_phones.index = ['TEL_{}'.format(i) for i in range(len(person_phones))]
new_df = pd.concat([new_df, person_phones], axis=1)
new_df.transpose().reset_index().rename(columns={'level_0': 'ID', 'level_1': 'NAME'})
要得到:
ID NAME TEL_0 TEL_1 TEL_2 TEL_3 TEL_4 TEL_5 TEL_6 TEL_7 \
0 1 John 123456 754987 465317 465987 NaN 465987 NaN 546783
1 2 Robert 264687 NaN NaN NaN 462531 NaN NaN NaN
2 3 William 432645 765346 875137 NaN NaN NaN NaN NaN
TEL_8
0 NaN
1 NaN
2 NaN
另一答案
你可以试试:
import pandas as pd
import numpy as np
data = {'ID': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3},
'NAME': {0: 'John', 1: 'John', 2: 'John', 3: 'Robert',
4: 'Robert', 5: 'William'},
'TEL_1': {0: 123456, 1: 465987, 2: None, 3: 264687, 4: None,
5: 432645},
'TEL_2': {0: 754987, 1: None, 2: 546783, 3: None, 4: 462531,
5: 765346},
'TEL_3': {0: 465317, 1: 465987, 2: None, 3: None, 4: None,
5: 875137}}
df = pd.DataFrame(data)
grouped = df.groupby(['ID', 'NAME'])
def merger(group):
nr_cols = [col for col in group.columns if 'TEL_' in col]
values = [group[col].values for col in nr_cols]
new_row = pd.Series()
i = 1
for row in values:
for nr in row:
if not np.isnan(nr):
new_row['TEL_{}'.format(i)] = nr
i += 1
return new_row
merged = grouped.apply(merger).unstack().reset_index()
merged
数据框将如下所示:
ID NAME TEL_1 TEL_2 TEL_3 TEL_4 TEL_5 TEL_6
1 John 123456 465987 754987 546783 465317 465987
2 Robert 264687 462531 NaN NaN NaN NaN
3 William 432645 765346 875137 NaN NaN NaN
以上是关于当DataFrame具有不同的值时,如何将重复行合并为一个的主要内容,如果未能解决你的问题,请参考以下文章
当该行的 clos 与上一行具有重复值时,如何进行 sql Select 查询,以便在后续行中将某些列留空?
如何将可变长度列表的 Pandas DataFrame 列(或系列)转换为固定宽度的 DataFrame [重复]