Pandas 数据框中的经过时间

Posted

技术标签:

【中文标题】Pandas 数据框中的经过时间【英文标题】:Elapsed times in Pandas dataframe 【发布时间】:2021-10-25 07:07:01 【问题描述】:

我需要计算事件之间经过的时间。我的任务类似于this one,但是当我尝试重现它时出现错误:

print (df1.sort_values(['ip','timestamp']).head(20))
df1['diff'] = df1.sort_values(['ip','timestamp']).groupby('ip')['timestamp'].diff()

                 ip           timestamp
26422    1.0.150.87 2021-08-21 03:17:00
26192    1.0.150.87 2021-08-21 03:17:00
77885   1.0.155.191 2021-08-22 05:54:00
77387   1.0.155.191 2021-08-22 05:54:00
27240    1.0.227.92 2021-08-21 03:47:00
27009    1.0.227.92 2021-08-21 03:47:00
47641  1.10.130.122 2021-08-21 13:44:00
47279  1.10.130.122 2021-08-21 13:44:00
11912   1.10.202.23 2021-08-20 16:59:00
11825   1.10.202.23 2021-08-20 16:59:00
92     1.10.213.176 2021-08-20 12:02:00
96     1.10.213.176 2021-08-20 12:02:00
2580   1.10.213.176 2021-08-20 13:09:00
2572   1.10.213.176 2021-08-20 13:09:00
4518   1.10.213.176 2021-08-20 13:57:00
4491   1.10.213.176 2021-08-20 13:57:00
8057   1.10.214.251 2021-08-20 15:23:00
8017   1.10.214.251 2021-08-20 15:23:00
35302   1.10.219.41 2021-08-21 08:09:00
35030   1.10.219.41 2021-08-21 08:09:00
Traceback (most recent call last):
  File "./analyser.py", line 59, in <module>
    df1['diff'] = df1.sort_values(['ip','timestamp']).groupby('ip')['timestamp'].diff()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 3607, in __setitem__
    self._set_item(key, value)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 3779, in _set_item
    value = self._sanitize_column(value)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 4501, in _sanitize_column
    return _reindex_for_setitem(value, self.index)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 10777, in _reindex_for_setitem
    raise err
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 10772, in _reindex_for_setitem
    reindexed_value = value.reindex(index)._values
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/series.py", line 4579, in reindex
    return super().reindex(index=index, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 4809, in reindex
    return self._reindex_axes(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 4830, in _reindex_axes
    obj = obj._reindex_with_indexers(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 4874, in _reindex_with_indexers
    new_data = new_data.reindex_indexer(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 666, in reindex_indexer
    self.axes[axis]._validate_can_reindex(indexer)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3785, in _validate_can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

不知道为什么它不起作用? 另外我想知道是否有更好的方法来解决这个问题,例如,使用“本机”Python 的功能? 感谢您的帮助!

【问题讨论】:

检查这个问题:***.com/questions/27236275/… 你能分享一下数据框吗? 【参考方案1】:

使用DataFrame.sort_values 并首先使用ignore_index=True 赋值:

df1 = df1.sort_values(['ip','timestamp'], ignore_index=True)
df1['diff'] = df1.groupby('ip')['timestamp'].diff()

【讨论】:

以上是关于Pandas 数据框中的经过时间的主要内容,如果未能解决你的问题,请参考以下文章

用 Pandas 数据框中的行填充嵌套字典

如何检索 Pandas 数据框中的列数?

用 pandas 中的 empty_rows 替换 pandas 数据框中的 NaN [重复]

如何在 Pandas 数据框中的特定位置插入一列? (更改熊猫数据框中的列顺序)

Pandas 数据框中的 MultiIndex Group By

减去 Pandas 或 Pyspark 数据框中的连续列