熊猫日期时间切片:junkdf.ix['2015-08-03':'2015-08-06'] 不起作用
Posted
技术标签:
【中文标题】熊猫日期时间切片:junkdf.ix[\'2015-08-03\':\'2015-08-06\'] 不起作用【英文标题】:pandas datetime slicing: junkdf.ix['2015-08-03':'2015-08-06'] not working熊猫日期时间切片:junkdf.ix['2015-08-03':'2015-08-06'] 不起作用 【发布时间】:2016-12-30 17:24:36 【问题描述】:junkdf:
rev
dtime
2015-08-03 20.45
2015-08-04 -2.57
2015-08-05 12.53
2015-08-06 -8.16
2015-08-07 -4.41
junkdf.reset_index().to_dict('rec')
['dtime': datetime.date(2015, 8, 3), 'rev': 20.45,
'dtime': datetime.date(2015, 8, 4), 'rev': -2.5699999999999994,
'dtime': datetime.date(2015, 8, 5), 'rev': 12.53,
'dtime': datetime.date(2015, 8, 6), 'rev': -8.16,
'dtime': datetime.date(2015, 8, 7), 'rev': -4.41]
junkdf.set_index('dtime',inplace=True)
为什么我不能像以下描述的那样进行日期时间切片:
python-pandas-dataframe-slicing-by-date-conditions
time series datetime slicing
junkdf['2015-08-03':]
C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)
1112 def searchsorted(self, key, side='left', sorter=None):
1113 # needs coercion on the key (DatetimeIndex does already)
-> 1114 return self.values.searchsorted(key, side=side, sorter=sorter)
1115
1116 _shared_docs['drop_duplicates'] = (
TypeError: unorderable types: datetime.date() > str()
junkdf.ix['2015-08-03':'2015-08-06']
C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)
1112 def searchsorted(self, key, side='left', sorter=None):
1113 # needs coercion on the key (DatetimeIndex does already)
-> 1114 return self.values.searchsorted(key, side=side, sorter=sorter)
1115
1116 _shared_docs['drop_duplicates'] = (
TypeError: unorderable types: datetime.date() > str()
开始 = junkdf.index.searchsorted(dt.datetime(2015, 8, 4))
C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)
1112 def searchsorted(self, key, side='left', sorter=None):
1113 # needs coercion on the key (DatetimeIndex does already)
-> 1114 return self.values.searchsorted(key, side=side, sorter=sorter)
1115
1116 _shared_docs['drop_duplicates'] = (
TypeError: can't compare datetime.datetime to datetime.date))
但是,如果我使用 dt.date(),则以下工作:
start = junkdf.index.searchsorted(dt.date(2015, 8, 4))
end = junkdf.index.searchsorted(dt.date(2015, 8, 6))
junkdf.ix[start:end]
rev
dtime
2015-08-04 -2.57
2015-08-05 12.53
更新:
junkdf = df[['dtime','rev']].groupby((df.dtime).dt.date).sum().copy()
df[['dtime','rev']]
的样子:
dtime rev
0 2015-08-03 07:59:59 -0.18
1 2015-08-03 08:59:59 -0.11
2 2015-08-03 09:59:59 -0.29
3 2015-08-03 10:59:59 -0.08
4 2015-08-03 11:59:59 0.69
更新 2:
我试过了:
df[['dtime','rev']].head()
dtime rev
0 2015-08-03 07:59:59 -0.18
1 2015-08-03 08:59:59 -0.11
2 2015-08-03 09:59:59 -0.29
3 2015-08-03 10:59:59 -0.08
4 2015-08-03 11:59:59 0.69
df[['dtime','rev']].groupby(pd.TimeGrouper('D', key=df.dtime)).sum()
C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\generic.py in __hash__(self)
804 def __hash__(self):
805 raise TypeError('0!r objects are mutable, thus they cannot be'
--> 806 ' hashed'.format(self.__class__.__name__))
807
808 def __iter__(self):
TypeError: 'Series' objects are mutable, thus they cannot be hashed
【问题讨论】:
它适合我。print(junkdf.index.dtype)
的输出是什么?
打印(junkdf.index.dtype)=对象
您的索引是字符串 dtype。您必须先将其转换为日期时间
所以我添加了其他信息。我通过在 datetime 列上进行 groupby 到达 junkdf。它不应该自动成为日期时间类型吗?
.dt.date
- 将datetime
dtype 转换为string
dtype
【参考方案1】:
假设您有以下源 DF(我从您之前的问题中获取并进行了更改,因此我们拥有多天的数据):
In [85]: df
Out[85]:
datetime hour rev
0 2016-05-01 01:00:00 1 -0.02
1 2016-05-01 02:00:00 2 -0.01
2 2016-05-01 03:00:00 3 -0.02
3 2016-05-01 04:00:00 4 -0.02
4 2016-05-01 05:00:00 5 -0.01
5 2016-05-02 06:00:00 6 -0.03
6 2016-05-02 07:00:00 7 -0.10
7 2016-05-02 08:00:00 8 -0.09
8 2016-05-03 09:00:00 9 -0.08
9 2016-05-03 10:00:00 10 -0.10
10 2016-05-03 11:00:00 11 -0.12
11 2016-05-04 12:00:00 12 -0.14
12 2016-05-04 13:00:00 13 -0.17
13 2016-05-04 14:00:00 14 -0.16
14 2016-05-05 15:00:00 15 -0.15
15 2016-05-05 16:00:00 16 -0.15
16 2016-05-05 17:00:00 17 -0.17
17 2016-05-06 18:00:00 18 -0.16
18 2016-05-06 19:00:00 19 -0.18
19 2016-05-06 20:00:00 20 -0.17
20 2016-05-07 21:00:00 21 -0.14
21 2016-05-07 22:00:00 22 -0.16
22 2016-05-08 23:00:00 23 -0.08
23 2016-05-08 00:00:00 24 -0.06
让我们按天分组并计算sum
:
In [89]: rslt = (df.assign(t=df.datetime - pd.Timedelta(hours=1))
....: .groupby(pd.TimeGrouper('D', key='t'))['rev']
....: .sum())
In [90]: rslt
Out[90]:
t
2016-05-01 -0.08
2016-05-02 -0.22
2016-05-03 -0.30
2016-05-04 -0.47
2016-05-05 -0.47
2016-05-06 -0.51
2016-05-07 -0.36
2016-05-08 -0.08
Freq: D, Name: rev, dtype: float64
In [92]: rslt.index.dtype
Out[92]: dtype('<M8[ns]')
现在切片应该可以正常工作了(因为索引有datetime
dtype):
In [91]: rslt.ix['2016-05-03':'2016-05-06']
Out[91]:
t
2016-05-03 -0.30
2016-05-04 -0.47
2016-05-05 -0.47
2016-05-06 -0.51
Freq: D, Name: rev, dtype: float64
【讨论】:
这会解决我在以下问题中描述的问题吗:***.com/questions/39065034/… IMO 没有像hour == 24
这样的东西,你可以有几个小时从 0
到 23
这只是电力/电力行业的标准。需求计量是在 1 到 24 小时完成的,因此计费也是如此。
我可以从索引中减去 1 秒,然后按照您的建议通过 pd.Timegrouper 进行分组。让我试试……
我得到:TypeError: 'Series' 对象是可变的,因此它们不能被散列以上是关于熊猫日期时间切片:junkdf.ix['2015-08-03':'2015-08-06'] 不起作用的主要内容,如果未能解决你的问题,请参考以下文章