Pandas 0.18.1 groupby 和多级聚合错误重新采样
Posted
技术标签:
【中文标题】Pandas 0.18.1 groupby 和多级聚合错误重新采样【英文标题】:Pandas 0.18.1 groupby and resample with multilevel aggregation error 【发布时间】:2016-12-16 02:52:32 【问题描述】:我刚刚将 pandas 从 0.17.1 更新到 0.18.1,并认为我在更改一些预先存在的代码时发现了下面概述的新重新采样方法的问题。根据这个文档,我下面示例中的 df3_resample 和 df4_resample 应该返回相同的数据帧,但是 df4_resample 会引发异常。这让我绊倒了一段时间,所以我想我会分享。
Exception: Column(s) A already selected
http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-breaking-resample
http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#groupby-syntax-with-window-and-resample-operations
df = pd.DataFrame(np.random.rand(10,4),
columns=list('ABCD'),
index=pd.date_range('2010-01-01 09:00:00', periods=10, freq='s'))
df['item'] = 'item_a' # add column for groupby
# THIS WORKS
df1_resample = df.groupby('item').resample('2s').agg('A': np.mean, 'B': np.max).reset_index()
print df1_resample
# THIS WORKS
df2_resample = df.resample('2s').agg('A': 'A_mean': np.mean, 'A_max': np.max).reset_index()
print df2_resample
# THIS WORKS
df3_resample = df.groupby('item').apply(lambda x: x.resample('2s').agg('A': 'A_mean': np.mean, 'A_max': np.max)).reset_index()
print df3_resample
# THIS DOESN"T WORKS
df4_resample = df.groupby('item').resample('2s').agg('A': 'A_mean': np.mean, 'A_max': np.max)
print df4_resample
输出:
item level_1 A B
0 item_a 2010-01-01 09:00:00 0.611660 0.739640
1 item_a 2010-01-01 09:00:02 0.615876 0.880113
2 item_a 2010-01-01 09:00:04 0.218292 0.441504
3 item_a 2010-01-01 09:00:06 0.753698 0.637787
4 item_a 2010-01-01 09:00:08 0.471272 0.474738
index A
A_mean A_max
0 2010-01-01 09:00:00 0.611660 0.813038
1 2010-01-01 09:00:02 0.615876 0.994657
2 2010-01-01 09:00:04 0.218292 0.233478
3 2010-01-01 09:00:06 0.753698 0.848107
4 2010-01-01 09:00:08 0.471272 0.610592
item level_1 A
A_mean A_max
0 item_a 2010-01-01 09:00:00 0.611660 0.813038
1 item_a 2010-01-01 09:00:02 0.615876 0.994657
2 item_a 2010-01-01 09:00:04 0.218292 0.233478
3 item_a 2010-01-01 09:00:06 0.753698 0.848107
4 item_a 2010-01-01 09:00:08 0.471272 0.610592
File "<some_file.py>", line 29, in <module>
df4_resample = df.groupby('item').resample('2s').agg('A': 'A_mean': np.mean, 'A_max': np.max)
File "C:\Anaconda2\lib\site-packages\pandas\tseries\resample.py", line 293, in aggregate
result, how = self._aggregate(arg, *args, **kwargs)
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 505, in _aggregate
result = list(_agg(arg, _agg_1dim).values())
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 496, in _agg
result[fname] = func(fname, agg_how)
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 479, in _agg_1dim
return colg.aggregate(how, _level=(_level or 0) + 1)
File "C:\Anaconda2\lib\site-packages\pandas\tseries\resample.py", line 293, in aggregate
result, how = self._aggregate(arg, *args, **kwargs)
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 528, in _aggregate
result = _agg(arg, lambda fname,
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 496, in _agg
result[fname] = func(fname, agg_how)
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 529, in <lambda>
agg_how: _agg_1dim(self._selection, agg_how))
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 475, in _agg_1dim
colg = self._gotitem(name, ndim=1, subset=subset)
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 680, in _gotitem
groupby=self._groupby[key],
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 326, in __getitem__
raise Exception('Column(s) %s already selected' % self._selection)
Exception: Column(s) A already selected
【问题讨论】:
【参考方案1】:我不确定为什么 resample
对此不起作用,但有一个方便的解决方法,不需要使用 lambda。试试这个:
df.groupby([
'item', pd.Grouper(freq = '2s')
]).agg(
'A' : ['mean', 'max']
).rename(columns =
'mean' : 'A_mean', 'max' : 'A_max'
, level = 1).reset_index()
您可以将pd.Grouper('2s')
添加到您的groupby()
,而不是使用.resample('2S')
。它的功能与您的情况相同。这是文档 --> http://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.Grouper.html
另一方面,您应该避免使用嵌套字典重命名列(已弃用),而应使用实际的 .rename()
函数。
【讨论】:
以上是关于Pandas 0.18.1 groupby 和多级聚合错误重新采样的主要内容,如果未能解决你的问题,请参考以下文章