python - 如何按python中的因子级别对pandas数据框中的行进行重新排序？

Posted 2023-03-27

技术标签:

【中文标题】python - 如何按python中的因子级别对pandas数据框中的行进行重新排序？【英文标题】：How to reorder rows in pandas dataframe by factor level in python? 【发布时间】：2022-01-07 10:41:23 【问题描述】：

我创建了一个小型数据集，比较每杯尺寸的咖啡饮料价格。

当我旋转我的数据集时，输出会自动按字母顺序重新排序索引（“大小”列）。

有没有办法为不同的大小分配一个数字级别（例如，小 = 0、中 = 1、大 = 2）并以这种方式重新排列行？

我知道这可以使用 forcats 库在 R 中完成（例如使用 fct_relevel），但我不知道如何在 python 中执行此操作。我更愿意保留使用 numpy 和 pandas 的解决方案。

data = 'Item': np.repeat(['Latte', 'Americano', 'Cappuccino'], 3),
        'Size': ['Small', 'Medium', 'Large']*3,
        'Price': [2.25, 2.60, 2.85, 1.95, 2.25, 2.45, 2.65, 2.95, 3.25]
       

df = pd.DataFrame(data, columns = ['Item', 'Size', 'Price'])
df = pd.pivot_table(df, index = ['Size'], columns = 'Item')
df

#         Price
# Item    Americano Cappuccino  Latte
#   Size            
#  Large       2.45       3.25   2.85
# Medium       2.25       2.95   2.60
#  Small       1.95       2.65   2.25

【问题讨论】：

df = df.reindex(["Small", "Medium", "Large"])? 【参考方案1】：

您可以将Categorical 类型与ordered=True 一起使用：

df.index = pd.Categorical(df.index,
                          categories=['Small', 'Medium', 'Large'],
                          ordered=True)
df = df.sort_index()

输出：

           Price                 
Item   Americano Cappuccino Latte
Small       1.95       2.65  2.25
Medium      2.25       2.95  2.60
Large       2.45       3.25  2.85

您可以通过以下方式访问代码：

>>> df.index.codes
array([0, 1, 2], dtype=int8)

如果这是一个系列：

>>> series.cat.codes

【讨论】：

刚刚编辑了一个错字，请随时重新编辑/恢复。 +1 :) 感谢@not_speshal ;) @mozway 正是我想要的，谢谢！正如 sammy 指出的，在旋转之前转换为分类是有利的；）【参考方案2】：

一种选择是在透视之前创建分类；对于这种情况，我使用来自pyjanitor 的encode_categorical，主要是为了方便：

# pip install pyjanitor
import pandas as pd
import janitor
(df
 .encode_categorical(Size = (None, 'appearance'))
 .pivot_table(index='Size', columns='Item')
)

           Price                 
Item   Americano Cappuccino Latte
Size                             
Small       1.95       2.65  2.25
Medium      2.25       2.95  2.60
Large       2.45       3.25  2.85

这样，您不必担心排序，因为旋转隐含地做到了这一点。您可以跳过 pyjanitor，只使用 Pandas：

(df
 .astype('Size': pd.CategoricalDtype(categories = ['Small', 'Medium', 'Large'], 
                                      ordered = True))
 .pivot_table(index='Size', columns='Item')
)

           Price                 
Item   Americano Cappuccino Latte
Size                             
Small       1.95       2.65  2.25
Medium      2.25       2.95  2.60
Large       2.45       3.25  2.85

【讨论】：

【参考方案3】：

第一种方式：

pivot_table 函数根据索引对行进行排序。因此，在 pivot_table 函数中应用索引时最好使用 lambda 函数。这样，您不需要任何进一步的排序步骤（更耗时）或任何第三方库。

df = pd.pivot_table(df, index = (lambda row: 0 if df.loc[row,'Size']=="Small" else 1 if df.loc[row,'Size']=="Medium" else 2), 
                    columns = 'Item')

         Price                 
Item Americano Cappuccino Latte
0         1.95       2.65  2.25
1         2.25       2.95  2.60
2         2.45       3.25  2.85

第二种方式：

您也可以使用自己的代码，然后对新创建的表进行重命名和排序：

df = pd.DataFrame(data, columns = ['Item', 'Size', 'Price'])
df = pd.pivot_table(df, index = ['Size'], columns = 'Item')

# rename:
df = df.rename(index= lambda x: 0 if x=="Small" else 1 if x=="Medium" else 2)

#sort:
df = df.sort_index(ascending = True)


         Price                 
Item Americano Cappuccino Latte
0         1.95       2.65  2.25
1         2.25       2.95  2.60
2         2.45       3.25  2.85

【讨论】：

以上是关于python - 如何按python中的因子级别对pandas数据框中的行进行重新排序？的主要内容，如果未能解决你的问题，请参考以下文章