在 4 个单独的图上绘制每个评级的组平均值

Posted

技术标签:

【中文标题】在 4 个单独的图上绘制每个评级的组平均值【英文标题】:Plot group averages for each rating on 4 separate plots 【发布时间】:2021-03-03 13:23:33 【问题描述】:

我有 4 个组(研究、销售、manu、hr),每个组有 2 个类别(0 和 1)。我正在尝试绘制列表ratings 中特征中每个组的平均分数。给我方法的代码看起来像这样(depts = ['research', 'sales', 'manu', 'hr']:

ratings = ['JobSatisfaction', 'PerformanceRating', 'EnvironmentSatisfaction', 'RelationshipSatisfaction']


for i in depts:
    for x in ratings:
        print(group_data.groupby([i]).mean()[x])

这会导致这个输出:

research
0.0    2.700000
1.0    2.773973
Name: JobSatisfaction, dtype: float64
research
0.0    3.100000
1.0    3.167808
Name: PerformanceRating, dtype: float64
research
0.0    2.500000
1.0    2.726027
Name: EnvironmentSatisfaction, dtype: float64
research
0.0    2.687500
1.0    2.705479
Name: RelationshipSatisfaction, dtype: float64
sales
0.0    2.754601
1.0    2.734940
Name: JobSatisfaction, dtype: float64
sales
0.0    3.125767
1.0    3.144578
Name: PerformanceRating, dtype: float64
sales
0.0    2.671779
1.0    2.734940
Name: EnvironmentSatisfaction, dtype: float64
sales
0.0    2.702454
1.0    2.602410
Name: RelationshipSatisfaction, dtype: float64
manu
0.0    2.682759
1.0    2.723077
Name: JobSatisfaction, dtype: float64
manu
0.0    3.186207
1.0    3.158974
Name: PerformanceRating, dtype: float64
manu
0.0    2.917241
1.0    2.735897
Name: EnvironmentSatisfaction, dtype: float64
manu
0.0    2.724138
1.0    2.689744
Name: RelationshipSatisfaction, dtype: float64
hr
0.0    2.705882
1.0    2.557692
Name: JobSatisfaction, dtype: float64
hr
0.0    3.196078
1.0    3.134615
Name: PerformanceRating, dtype: float64
hr
0.0    2.764706
1.0    2.596154
Name: EnvironmentSatisfaction, dtype: float64
hr
0.0    2.813725
1.0    2.961538
Name: RelationshipSatisfaction, dtype: float64

我的问题是如何将每个评级['JobSatisfaction', 'PerformanceRating', 'EnvironmentSatisfaction', 'RelationshipSatisfaction']的这些组均值(研究、销售、manu、hr)绘制到 4 个不同的条形图上,以便可视化和比较每个组之间的差异?

我的数据来自 IBM HR 数据集:https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset

【问题讨论】:

【参考方案1】:

您可以使用 seaborn 的 sns.barplot,并且由于您的 y 变量具有可比性,因此可以按颜色和相同的 y 轴进行分隔:

import statsmodels.api as sm
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")

ratings = ['JobSatisfaction', 'PerformanceRating', 'EnvironmentSatisfaction', 'RelationshipSatisfaction']

sns.barplot(data = df[['Department'] + ratings].melt(id_vars='Department'),
            x = 'variable',y='value',hue='Department')
plt.xticks(rotation=45) 
   

【讨论】:

以上是关于在 4 个单独的图上绘制每个评级的组平均值的主要内容,如果未能解决你的问题,请参考以下文章

通过在 R 中使用列表的组计算平均值

如何在ggplot中绘制大均值

facet_wrap 添加 geom_hline

facet_wrap 添加 geom_hline

如何在 Gnuplot 中绘制噪声值的平均值

如何在 r 中绘制最小值、最大值和平均值