Python 绘图 groupby.mean
Posted
技术标签:
【中文标题】Python 绘图 groupby.mean【英文标题】:Python plot groupby.mean 【发布时间】:2017-07-25 12:23:09 【问题描述】:我有下面称为 df1_df2 的数据框:
IdDeviceTypeNameDevice IdBox IdDeviceValue DateDeviceValue ValueDeviceValue weekday hour value
IdDevice
119 48 Chaudière Maud Ferrand 4 536448 2015-11-27 17:54:00 On 4 17 1
119 48 Chaudière Maud Ferrand 4 536449 2015-11-27 17:54:00 Off 4 17 0
119 48 Chaudière Maud Ferrand 4 536450 2015-11-27 17:54:00 On 4 17 1
119 48 Chaudière Maud Ferrand 4 536451 2015-11-27 17:54:00 Off 4 17 0
119 48 Chaudière Maud Ferrand 4 536453 2015-11-27 18:09:00 On 4 18 1
我想按绘图中的值(在“值”列中,对于每个设备类型(在“IdDeviceType”列中)进行分组,将“小时”列作为轴。
我们的想法是在给定一天中的小时数的情况下查看加热器或其他设备何时打开或关闭。
这就是我所做的:
df1_df2['value']= df1_df2['ValueDeviceValue']
df1_df2.loc[df1_df2['ValueDeviceValue'].str.lower()=='on','value'] = 1.
df1_df2.loc[df1_df2['ValueDeviceValue'].str.lower()=='off','value']= 0.
def my_plot(df,devids,idboxes):
df = df[df['IdDeviceType'].isin(devids)]
print (set(df.value.values))
vals = [df[df['IdBox']== idb].groupby('hour')['value'].mean() for idb in idboxes]
for val in vals :
plt.plot(val)
当我测试它时:
my_plot(df1_df2, [48], [4, 5])
我收到以下错误消息。看起来我不能group.by.mean
,因为值列未被识别为数字。
DataError Traceback (most recent call last)
<ipython-input-447-75ef0a27eb5e> in <module>()
----> 1 my_plot(df1_df2,[48],[4,5])
<ipython-input-445-b5ff09b606b7> in my_plot(df, devids, idboxes)
4 print (set(df.value.values))
5
6 vals = [df[df['IdBox']== idb].groupby('hour')['value'].mean() for idb in idboxes]
7 for val in vals :
8 #print (val)
<ipython-input-445-b5ff09b606b7> in <listcomp>(.0)
4 print (set(df.value.values))
5
6 vals = [df[df['IdBox']== idb].groupby('hour')['value'].mean() for idb in idboxes]
7 for val in vals :
8 #print (val)
/Users/chloegiraut/anaconda/lib/python3.5/site- packages/pandas/core/groupby.py in mean(self)
962 """
963 try:
964 return self._cython_agg_general('mean')
965 except GroupByError:
966 raise
/Users/chloegiraut/anaconda/lib/python3.5/site- packages/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
763
764 if len(output) == 0:
765 raise DataError('No numeric types to aggregate')
766
767 return self._wrap_aggregated_output(output, names)
DataError: No numeric types to aggregate
【问题讨论】:
【参考方案1】:要将值列设为数字,您可以:
# get the On/Off string as 1/0
df1_df2['value'] = (
df1_df2['ValueDeviceValue'].str.lower() == 'on').astype(np.uint8)
测试代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = [x.strip().split() for x in """
IdDevice IdDeviceType NameDevice IdBox IdDeviceValue DateDeviceValue ValueDeviceValue weekday hour
119 48 Chaud 4 536448 2015-11-27T17:54:00 On 4 17
119 48 Chaud 4 536449 2015-11-27T17:54:00 Off 4 17
119 48 Chaud 4 536450 2015-11-27T17:54:00 On 4 17
119 48 Chaud 4 536451 2015-11-27T17:54:00 Off 4 17
119 48 Chaud 4 536453 2015-11-27T18:09:00 On 4 18
""".split('\n')[1:-1]]
df1_df2 = pd.DataFrame(data=data[1:], columns=data[0])
for column in 'IdDevice IdDeviceType IdBox IdDeviceValue'.split():
df1_df2[column] = pd.to_numeric(df1_df2[column])
# get the On/Off string as 1/0
df1_df2['value'] = (
df1_df2['ValueDeviceValue'].str.lower() == 'on').astype(np.uint8)
def my_plot(df, devids, idboxes):
dev_idx = df['IdDeviceType'].isin(devids)
df = df[dev_idx]
print (set(df.value.values))
vals = [df[df['IdBox'] == idb].groupby('hour')['value'].mean()
for idb in idboxes]
for val in vals:
print()
print(val)
my_plot(df1_df2, [48], [4, 5])
结果:
set([0, 1])
hour
17 0.5
18 1.0
Name: value, dtype: float64
Series([], Name: value, dtype: uint8)
【讨论】:
太棒了斯蒂芬!它可以工作,但有一条错误消息 YOur 代码,我插入我的笔记本。它说 data = [x.strip().split() for x in """ """.split('\n')[1:-1]] 对于这两行它说:IndexError: list index out of range 我必须在 """ 和 """ 之间找出一些东西吗? 那部分代码只是生成一个数据框。所以,是的,你需要在“”之间的东西或使用你的数据......以上是关于Python 绘图 groupby.mean的主要内容,如果未能解决你的问题,请参考以下文章