如何使用子图创建 Pandas groupby 图

Posted 2023-02-14

技术标签:

【中文标题】如何使用子图创建 Pandas groupby 图【英文标题】：How to create Pandas groupby plot with subplots 【发布时间】：2015-07-10 15:12:46 【问题描述】：

我有一个这样的数据框：

     value     identifier
2007-01-01  0.781611      55
2007-01-01  0.766152      56
2007-01-01  0.766152      57
2007-02-01  0.705615      55
2007-02-01  0.032134      56 
2007-02-01  0.032134      57
2008-01-01  0.026512      55
2008-01-01  0.993124      56
2008-01-01  0.993124      57
2008-02-01  0.226420      55
2008-02-01  0.033860      56
2008-02-01  0.033860      57

所以我对每个标识符进行分组：

df.groupby('identifier')

现在我想在网格中生成子图，每组一个图。我都试过了

df.groupby('identifier').plot(subplots=True)

或

df.groupby('identifier').plot(subplots=False)

和

plt.subplots(3,3)
df.groupby('identifier').plot(subplots=True)

无济于事。如何创建图表？

【问题讨论】：

查看seaborn，它做得非常好。谢谢，但我试图避免使用 seaborn 并仅使用 matplotlib。依赖和Windows环境等。旧评论，但seaborn 是matplotlib 的API。 Seaborn 将其减少到 1 行，无需任何数据帧转换：sns.relplot(kind='line', data=df.reset_index(), row='identifier', x='index', y='value')。 【参考方案1】：

这是一个包含许多组（随机假数据）的自动布局，使用grouped.get_group(key) 将向您展示如何制作更优雅的图。

import pandas as pd
from numpy.random import randint
import matplotlib.pyplot as plt


df = pd.DataFrame(randint(0,10,(200,6)),columns=list('abcdef'))
grouped = df.groupby('a')
rowlength = grouped.ngroups/2                         # fix up if odd number of groups
fig, axs = plt.subplots(figsize=(9,4), 
                        nrows=2, ncols=rowlength,     # fix as above
                        gridspec_kw=dict(hspace=0.4)) # Much control of gridspec

targets = zip(grouped.groups.keys(), axs.flatten())
for i, (key, ax) in enumerate(targets):
    ax.plot(grouped.get_group(key))
    ax.set_title('a=%d'%key)
ax.legend()
plt.show()

【讨论】：

你提到了如果奇数则修复，所以：rowlength = grouped.ngroups/2 + (0 if grouped.ngroups % 2 == 0 else 1) 理解这个工作的原因是你生成了一堆轴，然后将每个轴对象依次传递给正在绘制的每个组，这很有帮助。您正在用子组图填充每个子图。整洁！【参考方案2】：

您可以使用pd.pivot_table 获取列中的identifiers，然后调用plot()

pd.pivot_table(df.reset_index(),
               index='index', columns='identifier', values='value'
              ).plot(subplots=True)

还有，

的输出

pd.pivot_table(df.reset_index(),
               index='index', columns='identifier', values='value'
               )

看起来像-

identifier        55        56        57
index
2007-01-01  0.781611  0.766152  0.766152
2007-02-01  0.705615  0.032134  0.032134
2008-01-01  0.026512  0.993124  0.993124
2008-02-01  0.226420  0.033860  0.033860

【讨论】：

【参考方案3】：

如果您有一个带有多索引的系列。这是所需图表的另一种解决方案。

df.unstack('indentifier').plot.line(subplots=True)

【讨论】：

【参考方案4】：

对于那些需要绘制图表以通过多列分组来探索不同聚合级别的人来说，这是一个解决方案。

from numpy.random import randint
from numpy.random import randint
import matplotlib.pyplot as plt
import numpy as np

levels_bool = np.tile(np.arange(0,2), 100)
levels_groups = np.repeat(np.arange(0,4), 50)
x_axis = np.tile(np.arange(0,10), 20)
values = randint(0,10,200)

stacked = np.stack((levels_bool, levels_groups, x_axis, values), axis=0)
df = pd.DataFrame(stacked.T, columns=['bool', 'groups', 'x_axis', 'values'])

columns = len(df['bool'].unique())
rows = len(df['groups'].unique())
fig, axs = plt.subplots(rows, columns, figsize = (20,20))

y_index_counter = count(0)
groupped_df = df.groupby([ 'groups', 'bool','x_axis']).agg(
    'values': ['min', 'mean', 'median', 'max']
)
for group_name, grp in groupped_df.groupby(['groups']):
    y_index = next(y_index_counter)
    x_index_counter = count(0)
    for boolean, grp2 in grp.groupby(['bool']):
        x_index = next(x_index_counter)
        axs[y_index, x_index].plot(grp2.reset_index()['x_axis'], grp2.reset_index()['values'], 
                                   label=str(key)+str(key2))
        axs[y_index, x_index].set_title("Group: Bool:".format(group_name, boolean))

ax.legend()
plt.subplots_adjust(hspace=0.5)
plt.show()

【讨论】：

以上是关于如何使用子图创建 Pandas groupby 图的主要内容，如果未能解决你的问题，请参考以下文章