Python 绘图 groupby.mean

Posted 2023-03-11

技术标签:

【中文标题】Python 绘图 groupby.mean【英文标题】：Python plot groupby.mean 【发布时间】：2017-07-25 12:23:09 【问题描述】：

我有下面称为 df1_df2 的数据框：

IdDeviceTypeNameDevice  IdBox   IdDeviceValue   DateDeviceValue   ValueDeviceValue weekday     hour          value
IdDevice                                    
119 48  Chaudière Maud Ferrand  4   536448  2015-11-27 17:54:00     On          4               17           1
119 48  Chaudière Maud Ferrand  4   536449  2015-11-27 17:54:00     Off         4               17           0
119 48  Chaudière Maud Ferrand  4   536450  2015-11-27 17:54:00     On          4               17           1
119 48  Chaudière Maud Ferrand  4   536451  2015-11-27 17:54:00     Off         4               17           0
119 48  Chaudière Maud Ferrand  4   536453  2015-11-27 18:09:00     On          4               18           1

我想按绘图中的值（在“值”列中，对于每个设备类型（在“IdDeviceType”列中）进行分组，将“小时”列作为轴。

我们的想法是在给定一天中的小时数的情况下查看加热器或其他设备何时打开或关闭。

这就是我所做的：

df1_df2['value']= df1_df2['ValueDeviceValue']
df1_df2.loc[df1_df2['ValueDeviceValue'].str.lower()=='on','value'] = 1.
df1_df2.loc[df1_df2['ValueDeviceValue'].str.lower()=='off','value']= 0.

def my_plot(df,devids,idboxes):
    df = df[df['IdDeviceType'].isin(devids)]
    print (set(df.value.values))

    vals = [df[df['IdBox']== idb].groupby('hour')['value'].mean() for idb in idboxes]
    for val in vals : 
        plt.plot(val)

当我测试它时：

my_plot(df1_df2, [48], [4, 5])

我收到以下错误消息。看起来我不能group.by.mean，因为值列未被识别为数字。

DataError                                 Traceback (most recent call last)
<ipython-input-447-75ef0a27eb5e> in <module>()
----> 1 my_plot(df1_df2,[48],[4,5])

<ipython-input-445-b5ff09b606b7> in my_plot(df, devids, idboxes)
  4     print (set(df.value.values))
  5 
  6     vals = [df[df['IdBox']== idb].groupby('hour')['value'].mean()     for idb in idboxes]
  7     for val in vals :
  8         #print (val)

<ipython-input-445-b5ff09b606b7> in <listcomp>(.0)
  4     print (set(df.value.values))
  5 
  6     vals = [df[df['IdBox']== idb].groupby('hour')['value'].mean() for idb in idboxes]
  7     for val in vals :
  8         #print (val)

/Users/chloegiraut/anaconda/lib/python3.5/site-    packages/pandas/core/groupby.py in mean(self)
962         """
963         try:
964             return self._cython_agg_general('mean')
965         except GroupByError:
966             raise

/Users/chloegiraut/anaconda/lib/python3.5/site-      packages/pandas/core/groupby.py in _cython_agg_general(self, how,     numeric_only)
763 
764         if len(output) == 0:
765             raise DataError('No numeric types to aggregate')
766 
767         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

【问题讨论】：

【参考方案1】：

要将值列设为数字，您可以：

# get the On/Off string as 1/0
df1_df2['value'] = (
    df1_df2['ValueDeviceValue'].str.lower() == 'on').astype(np.uint8)

测试代码：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = [x.strip().split() for x in """
    IdDevice IdDeviceType NameDevice IdBox IdDeviceValue DateDeviceValue ValueDeviceValue weekday hour
    119 48  Chaud 4   536448  2015-11-27T17:54:00     On          4 17
    119 48  Chaud 4   536449  2015-11-27T17:54:00     Off         4 17
    119 48  Chaud 4   536450  2015-11-27T17:54:00     On          4 17
    119 48  Chaud 4   536451  2015-11-27T17:54:00     Off         4 17
    119 48  Chaud 4   536453  2015-11-27T18:09:00     On          4 18
""".split('\n')[1:-1]]
df1_df2 = pd.DataFrame(data=data[1:], columns=data[0])
for column in 'IdDevice IdDeviceType IdBox IdDeviceValue'.split():
    df1_df2[column] = pd.to_numeric(df1_df2[column])

# get the On/Off string as 1/0
df1_df2['value'] = (
    df1_df2['ValueDeviceValue'].str.lower() == 'on').astype(np.uint8)

def my_plot(df, devids, idboxes):
    dev_idx = df['IdDeviceType'].isin(devids)
    df = df[dev_idx]
    print (set(df.value.values))
    vals = [df[df['IdBox'] == idb].groupby('hour')['value'].mean()
            for idb in idboxes]
    for val in vals:
        print()
        print(val)

my_plot(df1_df2, [48], [4, 5])

结果：

set([0, 1])

hour
17    0.5
18    1.0
Name: value, dtype: float64

Series([], Name: value, dtype: uint8)

【讨论】：

太棒了斯蒂芬！它可以工作，但有一条错误消息 YOur 代码，我插入我的笔记本。它说 data = [x.strip().split() for x in """ """.split('\n')[1:-1]] 对于这两行它说：IndexError: list index out of range 我必须在 """ 和 """ 之间找出一些东西吗？那部分代码只是生成一个数据框。所以，是的，你需要在“”之间的东西或使用你的数据......

以上是关于Python 绘图 groupby.mean的主要内容，如果未能解决你的问题，请参考以下文章

Python绘图之（1）Turtle库详解