Pandas 数据框中的经过时间
Posted
技术标签:
【中文标题】Pandas 数据框中的经过时间【英文标题】:Elapsed times in Pandas dataframe 【发布时间】:2021-10-25 07:07:01 【问题描述】:我需要计算事件之间经过的时间。我的任务类似于this one,但是当我尝试重现它时出现错误:
print (df1.sort_values(['ip','timestamp']).head(20))
df1['diff'] = df1.sort_values(['ip','timestamp']).groupby('ip')['timestamp'].diff()
ip timestamp
26422 1.0.150.87 2021-08-21 03:17:00
26192 1.0.150.87 2021-08-21 03:17:00
77885 1.0.155.191 2021-08-22 05:54:00
77387 1.0.155.191 2021-08-22 05:54:00
27240 1.0.227.92 2021-08-21 03:47:00
27009 1.0.227.92 2021-08-21 03:47:00
47641 1.10.130.122 2021-08-21 13:44:00
47279 1.10.130.122 2021-08-21 13:44:00
11912 1.10.202.23 2021-08-20 16:59:00
11825 1.10.202.23 2021-08-20 16:59:00
92 1.10.213.176 2021-08-20 12:02:00
96 1.10.213.176 2021-08-20 12:02:00
2580 1.10.213.176 2021-08-20 13:09:00
2572 1.10.213.176 2021-08-20 13:09:00
4518 1.10.213.176 2021-08-20 13:57:00
4491 1.10.213.176 2021-08-20 13:57:00
8057 1.10.214.251 2021-08-20 15:23:00
8017 1.10.214.251 2021-08-20 15:23:00
35302 1.10.219.41 2021-08-21 08:09:00
35030 1.10.219.41 2021-08-21 08:09:00
Traceback (most recent call last):
File "./analyser.py", line 59, in <module>
df1['diff'] = df1.sort_values(['ip','timestamp']).groupby('ip')['timestamp'].diff()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 3607, in __setitem__
self._set_item(key, value)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 3779, in _set_item
value = self._sanitize_column(value)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 4501, in _sanitize_column
return _reindex_for_setitem(value, self.index)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 10777, in _reindex_for_setitem
raise err
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 10772, in _reindex_for_setitem
reindexed_value = value.reindex(index)._values
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/series.py", line 4579, in reindex
return super().reindex(index=index, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 4809, in reindex
return self._reindex_axes(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 4830, in _reindex_axes
obj = obj._reindex_with_indexers(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 4874, in _reindex_with_indexers
new_data = new_data.reindex_indexer(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 666, in reindex_indexer
self.axes[axis]._validate_can_reindex(indexer)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3785, in _validate_can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
不知道为什么它不起作用? 另外我想知道是否有更好的方法来解决这个问题,例如,使用“本机”Python 的功能? 感谢您的帮助!
【问题讨论】:
检查这个问题:***.com/questions/27236275/… 你能分享一下数据框吗? 【参考方案1】:使用DataFrame.sort_values
并首先使用ignore_index=True
赋值:
df1 = df1.sort_values(['ip','timestamp'], ignore_index=True)
df1['diff'] = df1.groupby('ip')['timestamp'].diff()
【讨论】:
以上是关于Pandas 数据框中的经过时间的主要内容,如果未能解决你的问题,请参考以下文章
用 pandas 中的 empty_rows 替换 pandas 数据框中的 NaN [重复]
如何在 Pandas 数据框中的特定位置插入一列? (更改熊猫数据框中的列顺序)