Python pandas groupby 方法无法正常工作

Posted 2023-03-11

技术标签:

【中文标题】Python pandas groupby 方法无法正常工作【英文标题】：Python pandas groupby method not working properly 【发布时间】：2014-04-26 02:34:48 【问题描述】：

我有一个文本文件，每一行都有数据，每一行都有一个时间戳。所以我将数据读取到这样的数据框中：

table = pd.read_table(file, sep='|', skiprows=[1], usecols = columns, parse_dates = dateColumns, date_parser = parsedate, converters=columnsFormat)

到目前为止，一切都很好。

我的结果是一个数据框，如下例所示：

Name Local  Code Date        Value
A1   Here   01   01-01-1990  1.2
A1   Here   01   01-02-1990  0.8
A1   Here   01   01-03-1990  1.6
...
A2   There  02   01-01-1990  1.1
A2   There  02   01-02-1990  0.7
A2   There  02   01-03-1990  1.3
...
An   Where  n    12-31-2013  2.1

日期是按时间顺序排列的，但是我有几个组，它们的元素数量不同。

我要做的是按Name、Local 和Code 对数据框进行分组。所以我可以将这些值作为索引，将日期和值作为组的列。

类似下面的例子：

(Index)            Date        Value
(A1   Here   01)   01-01-1990  1.2
                   01-02-1990  0.8
                   01-03-1990  1.6
...
(A2   There  02)   01-01-1990  1.1
                   01-02-1990  0.7
                   01-03-1990  1.3
...
(An   Where  n)    12-31-2013  2.1

但是当我执行时，而不是像这样的组

table = table.groupby(['Name', 'Local', 'Code'])

我最终得到了像下面这样的组。第一组包含第 1 天的所有数据，第二组包含第 2 天的所有数据，依此类推。

Name Local  Code Date        Value
A1   Here   01   01-01-1990  1.2
A2   There  02   01-01-1990  1.1
...
A1   Here   01   01-02-1990  0.8
A2   There  02   01-02-1990  0.7
...
A1   Here   01   01-03-1990  1.6
A2   There  02   01-03-1990  1.3
...
An   Where  n    12-31-2013  2.1

有什么想法可以按照我解释的方式进行分组吗？

如果我使用 table = table.groupby(['Name', 'Local', 'Code', 'Date']) 我有一组像：

Name Local  Code Date        Value
A1   Here   01   01-01-1990  1.2
                 01-02-1990  0.8
                 01-03-1990  1.6
...
A2   There  02   01-01-1990  1.1
                 01-02-1990  0.7
                 01-03-1990  1.3
...
An   Where  n    12-31-2013  2.1

这几乎是我想要的，但是我必须将它分成几个组，由 Name、Local 和 Code。有可能吗？

读取表格时，parse_dates 和 converters 是否更改索引中的某些内容？

希望我现在说清楚了。谢谢你。

【问题讨论】：

***.com/questions/17027470/… 的可能重复项每个“名称本地代码”有一个数据框，包含两列：日期和值。你到底想做什么（为什么）？我有一个包含多个时间序列的顺序 .txt 文件。每个“名称本地代码”组都是不同的时间序列。我想将每个时间序列分成一组，以便处理它们。当你说“过程”时，这就是我们感兴趣的部分...... 【参考方案1】：

作为一种解决方法，您可以设置索引，然后按索引分组：

In [11]: df1 = df.set_index(['Name', 'Local', 'Code'])

In [12]: g = df1.groupby(df1.index)

In [13]: for i in df1.groupby(df1.index): print i
(('A1', 'Here', 1),
                       Date  Value
Name Local Code                   
A1   Here  1     01-01-1990    1.2
           1     01-02-1990    0.8
           1     01-03-1990    1.6)

【讨论】：

对不起，我以为这解决了问题，但没有。 @Lucas 显然你必须详细说明。无论如何，就像我通常评论的那样，最好使用 groupby 方法，例如 apply。【参考方案2】：

回答你最后一个问题：

如果你遍历

groups = df.groupby(['name','local','code'])

你应该得到每组单独的数据帧，即：

for g, grp in groups:
    print grp

【讨论】：

我认为 OP 有这个，但问题是这些没有被 [name, local, code] 索引（即在这个阶段它不尊重 as_index）这是 OP 所要求的关于。

以上是关于Python pandas groupby 方法无法正常工作的主要内容，如果未能解决你的问题，请参考以下文章