如何从 DataFrame，Python-3 中找到前 N 个最小值

Posted 2023-03-11

技术标签:

【中文标题】如何从 DataFrame，Python-3 中找到前 N 个最小值【英文标题】：How to find top N minimum values from the DataFrame, Python-3 【发布时间】：2020-04-16 01:39:33 【问题描述】：

我有以下带有“年龄”字段的数据框，需要从数据框中找到前 3 个最小年龄

DF = pd.DataFrame.from_dict('Name':['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], 'Age':[18, 45, 35, 70, 23, 24, 50, 65, 18, 23])

DF['Age'].min()

想要列表中的前两个年龄，即 18、23，如何实现？

注意：DataFrame - DF 包含年龄重复，即 18 和 23 重复两次，需要唯一值。

【问题讨论】：

【参考方案1】：

您可以使用nsmallest(..) [pandas-doc]:

df<b>.nsmallest(2, 'Age')</b>

对于给定的样本数据，这给了我们：

>>> df.nsmallest(2, 'Age')
  Name  Age
0    A   18
4    E   23

或者如果您只需要Age 列的值：

>>> df['Age'].nsmallest(2)
0    18
4    23
Name: Age, dtype: int64

或者你可以把它包装在一个列表中：

>>> df['Age'].nsmallest(2).to_list()
[18, 23]

您可以通过首先构造一个具有唯一值的Series 来获得n 个最小唯一值：

>>> pd.Series(df['Age'].unique()).nsmallest(2)
0    18
4    23
dtype: int64
>>> df['Age'].drop_duplicates().nsmallest(2)
0    18
4    23
Name: Age, dtype: int64

【讨论】：

@SPy: 你也可以使用df['Age'].nsmallest(2) :)【参考方案2】：

正确的是使用nsmallest，这里我展示另一种方式：DataFrame.sort_values + DataFrame.head

df['Age'].sort_values().head(2).tolist()
#[18, 23]

更新

如果有重复，我们可以使用之前的Series.drop_duplicates：

df['Age'].drop_duplicates().nsmallest(2).tolist()
#df['Age'].drop_duplicates().sort_values().head(2).tolist()
#[18, 23]

或np.sort + np.unique

[*np.sort(df['Age'].unique())[:2]]
#[18, 23]

【讨论】：

以上是关于如何从 DataFrame，Python-3 中找到前 N 个最小值的主要内容，如果未能解决你的问题，请参考以下文章