Pandas 中的 SQL 选择和分组依据

Posted 2023-03-11

技术标签:

【中文标题】Pandas 中的 SQL 选择和分组依据【英文标题】：SQL Select and Group by in Pandas 【发布时间】：2021-07-09 02:47:57 【问题描述】：

Track   Actor                  Movie
1       Katherine Hepburn      Guess Who's Coming to Dinner
2       Katherine Hepburn      Guess Who's Coming to Dinner
3       Katherine Hepburn      On Golden Pond
4       Katherine Hepburn      The Lion in Winter
5       Bette Davis            What Ever Happened to Baby Jane?
6       Bette Davis            The Letter
7       Bette Davis            The Letter
...
100     Omar Shariff           Lawrence of Arabia

需要在 python 中编写代码来选择所有出演过不止一部电影的演员并将他们的名字附加到一个列表中。

以下 SQL 查询的 Python 等效项。

SELECT Actor, count(DISTINCT Movie)
FROM table
GROUP by Actor
HAVING count(DISTINCT Movie) > 1

【问题讨论】：

【参考方案1】：

您可以使用drop_duplicates() 方法获取DISTINCT 电影值：

df=df.drop_duplicates(subset=['Actor','Movie'])

现在对于分组和聚合使用 groupby() 方法并将 agg() 方法链接到它：

result=df.groupby('Actor').agg(count=('Movie','count'))

最后使用布尔掩码并检查您的条件（count>1）：

result=result[result['count']>1]

【讨论】：

以上是关于Pandas 中的 SQL 选择和分组依据的主要内容，如果未能解决你的问题，请参考以下文章