聚合和分组数据然后根据列排序
Posted
技术标签:
【中文标题】聚合和分组数据然后根据列排序【英文标题】:aggregate and groupby data then sort according to a column 【发布时间】:2021-04-06 05:19:08 【问题描述】:在如下数据集中:
data = pd.DataFrame('AuthorName':["Wendelaar Bonga"," Sjoerd E.", "Grätzel"," Michael", "Willett", "Walter C.",
"Kessler", "Ronald C.", "Witten, Edward", "Wang, Zhong Lin"],
'seniorityLevel':[10, 45, 13, 89, 3, 8, 19, 22, 10, 59],
'SubjectField': ["Biomedical Engineering", "Inorganic & Nuclear Chemistry",
"Organic Chemistry", "Biomedical Engineering", "Developmental Biology",
"Mechanical Engineering & Transports", "Biomedical Engineering", "Microbiology",
"Cardiovascular System & Hematology", "Biomedical Engineering"],
'NumberOfPapers':[109, 284, 34, 109, 78, 90, 109, 54, 32, 109],
)
我需要计算经验级别的最小值、平均值、中值和最大值以及每个学科领域的论文数量。当数据按平均资历级别排序时,显示前 10 和后 10 表。 我试过这段代码:
d=data.groupby(["SubjectField"]).agg('seniorityLevel':['min', 'mean', 'median', 'max'],'NumberOfPapers':['min', 'mean', 'median', 'max'])
但我无法按资历级别对表格进行排序
【问题讨论】:
【参考方案1】:尝试使用元组对 multiIndex 标题列进行排序。
d_sort = d.sort_values(('seniorityLevel', 'mean'))
pd.concat([d_sort.head(2), d_sort.tail(2)])
输出(这里只显示顶部 2 和底部 2):
seniorityLevel NumberOfPapers
min mean median max min mean median max
SubjectField
Developmental Biology 3 3.00 3 3 78 78 78 78
Mechanical Engineering & Transports 8 8.00 8 8 90 90 90 90
Biomedical Engineering 10 44.25 39 89 109 109 109 109
Inorganic & Nuclear Chemistry 45 45.00 45 45 284 284 284 284
【讨论】:
以上是关于聚合和分组数据然后根据列排序的主要内容,如果未能解决你的问题,请参考以下文章