在python中按时间分组和绘制数据

Posted 2023-03-11

技术标签:

【中文标题】在python中按时间分组和绘制数据【英文标题】：grouping and plotting data by time in python 【发布时间】：2017-11-02 13:18:51 【问题描述】：

我有一个 csv 文件，我正在尝试绘制每月某些值的平均值。我的 csv 文件的结构如下所示，所以我认为我应该每天分组我的数据，然后每月分组以计算平均值。

timestamp,heure,lat,lon,impact,type
2007-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2007-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-01-02 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-01-03 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2007-01-03 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1

我正在使用此代码：

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

df= pd.read_csv("ave.txt", sep=',', names =["timestamp","heure","lat","lon","impact","type"])
daily = df.set_index('timestamp').groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()

但是，我不断收到这样的错误：

KeyError: '找不到石斑鱼名称时间戳'

有什么想法吗？？

【问题讨论】：

【参考方案1】：

您收到此错误是因为您已将 timestamp 列设置为 index。尝试从TimeGrouper() 或set_index 方法中删除key='timestamp'，它应该按您的预期分组：

daily = df.set_index('timestamp').groupby(pd.TimeGrouper(freq='D', axis=1), axis=1)['impact'].count()

或

daily = df.groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()

【讨论】：

谢谢你的回答，我按照你说的做了，现在我收到了这个错误：IndexError: indices are out-of-bounds 这发生在哪一行？在这一行：daily = df.groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count () 您是否有意在groupby() 中使用axis=1？您可以尝试不使用该参数，看看是否符合您的预期？当我尝试不使用该参数时，它会给出另一个错误：轴必须是 DatetimeIndex，但得到了一个“索引”实例【参考方案2】：

我相信你需要DataFrame.resample。

还必须通过read_csv 中的参数parse_dates 和index_col 将timestamp 转换为DataTimeindex。

names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('fou.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'])
print (data.head())

#your code
daily = data.groupby(pd.TimeGrouper(freq='D'))['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()

#more simpliest
daily = data.resample('D')['impact'].count()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()

还要检查是否真的需要count，而不是size。 What is the difference between size and count in pandas?

daily = data.resample('D')['impact'].size()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()

【讨论】：

谢谢你，我照你说的做了，它给了我这个错误：raise TypeError('Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex') TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex skina@skina:~/Documents/density$ python moyenne.py heure lat lon 影响类型 timestamp timestamp heure lat lon 影响类型 2007-01-01 00:00:00 13:58:43 33.837 -9.205 10.3 1 2007-01-02 00:00:00 00:07:28 34.5293 -10.2384 17.7 1 2007-01-02 00:00:00 23:01:03 35.0617 -1.435 -17.1 2 2007-01-03 00:00:00 01:14:29 36.5685 0.9043 36.8 1 其实我有一个很大的 csv 文件，我在这里发布的只是我的数据的一个小样本，如何转换整个 csv 文件？您的代码运行良好，但没有给我所需的结果。我需要绘制每月影响的平均值

以上是关于在python中按时间分组和绘制数据的主要内容，如果未能解决你的问题，请参考以下文章