ValueError:索引必须是单调的
Posted
技术标签:
【中文标题】ValueError:索引必须是单调的【英文标题】:ValueError: index must be monotonic 【发布时间】:2019-06-05 22:54:06 【问题描述】:dfdata.Speed.rolling('60T', closed='right').sum()
我正在尝试在这一列上应用滚动总和,并整理出整个数据,但我仍然遇到同样的错误。谁能帮我解决它??数据中索引列中的第一个日期和时间列,第二个是普通列。这就是为什么它看起来有点重复。
DateTime DateTime Speed distance IDs totalHours
2011-01-01 00:19:00 2011-01-01 00:19:00 0.041916 0.000710 19 0.016944
2011-01-01 00:20:00 2011-01-01 00:20:00 0.033719 0.000562 19 0.016667
2011-01-01 00:20:59 2011-01-01 00:20:59 0.153553 0.002517 19 0.016389
2011-01-01 00:21:59 2011-01-01 00:21:59 0.142272 0.002371 19 0.016667
2011-01-01 00:23:00 2011-01-01 00:23:00 0.033166 0.000562 19 0.016944
2011-01-01 00:24:00 2011-01-01 00:24:00 0.037843 0.000631 19 0.016667
2011-01-01 00:26:00 2011-01-01 00:26:00 0.050262 0.001675 19 0.033333
2011-01-01 00:27:00 2011-01-01 00:27:00 0.032249 0.000537 19 0.016667
2011-01-01 00:27:59 2011-01-01 00:27:59 0.180206 0.002953 19 0.016389
2011-01-01 00:29:00 2011-01-01 00:29:00 0.133477 0.002262 19 0.016944
2011-01-01 00:30:00 2011-01-01 00:30:00 0.128053 0.002134 19 0.016667
2011-01-01 00:30:59 2011-01-01 00:30:59 0.041964 0.000688 19 0.016389
2011-01-01 00:32:00 2011-01-01 00:32:00 0.072529 0.001229 19 0.016944
2011-01-01 00:33:00 2011-01-01 00:33:00 0.052437 0.000874 19 0.016667
2011-01-01 00:33:59 2011-01-01 00:33:59 0.033903 0.000556 19 0.016389
2011-01-01 00:35:00 2011-01-01 00:35:00 0.060076 0.001018 19 0.016944
2011-01-01 00:36:00 2011-01-01 00:36:00 0.121709 0.002028 19 0.016667
2011-01-01 00:36:59 2011-01-01 00:36:59 0.090517 0.001483 19 0.016389
2011-01-01 00:37:59 2011-01-01 00:37:59 0.088304 0.001472 19 0.016667
2011-01-01 00:39:00 2011-01-01 00:39:00 0.100654 0.001706 19 0.016944
2011-01-01 00:40:00 2011-01-01 00:40:00 0.034839 0.000581 19 0.016667
2011-01-01 00:40:59 2011-01-01 00:40:59 0.164753 0.002700 19 0.016389
2011-01-01 00:42:00 2011-01-01 00:42:00 0.214163 0.003629 19 0.016944
2011-01-01 00:43:00 2011-01-01 00:43:00 0.283706 0.004728 19 0.016667
2011-01-01 00:45:00 2011-01-01 00:45:00 0.055676 0.001856 19 0.033333
2011-01-01 00:46:00 2011-01-01 00:46:00 0.138059 0.002301 19 0.016667
2011-01-01 00:46:59 2011-01-01 00:46:59 0.339829 0.005569 19 0.016389
2011-01-01 00:48:00 2011-01-01 00:48:00 0.169921 0.002879 19 0.016944
2011-01-01 00:49:00 2011-01-01 00:49:00 0.072382 0.001206 19 0.016667
2011-01-01 00:49:59 2011-01-01 00:49:59 0.029009 0.000475 19 0.016389
这是示例数据。
这是我得到的错误。
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-59-3224ac27b0b8> in <module>()
1 # dfdata.Speed.rolling('60T', closed='right').sum()
----> 2 dfdata.Speed.rolling('60T', closed='right').sum()
~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in rolling(self, window, min_periods, freq, center, win_type, on, axis, closed)
6193 min_periods=min_periods, freq=freq,
6194 center=center, win_type=win_type,
-> 6195 on=on, axis=axis, closed=closed)
6196
6197 cls.rolling = rolling
~/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in rolling(obj, win_type, **kwds)
2050 return Window(obj, win_type=win_type, **kwds)
2051
-> 2052 return Rolling(obj, **kwds)
2053
2054
~/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in __init__(self, obj, window, min_periods, freq, center, win_type, axis, on, closed, **kwargs)
84 self.win_freq = None
85 self.axis = obj._get_axis_number(axis) if axis is not None else None
---> 86 self.validate()
87
88 @property
~/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in validate(self)
1085 timedelta))):
1086
-> 1087 self._validate_monotonic()
1088 freq = self._validate_freq()
1089
~/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in _validate_monotonic(self)
1117 formatted = self.on or 'index'
1118 raise ValueError("0 must be "
-> 1119 "monotonic".format(formatted))
1120
1121 def _validate_freq(self):
ValueError: index must be monotonic
【问题讨论】:
请务必在提问时添加足够的信息。没有输入数据集,没有指定错误等。其他人会知道并想解决吗?请说明所有这些。 数据是百万条记录,不知道是哪里出了问题,我应该提供哪一部分数据给你??不确定 但我会编辑问题并尝试进一步解释 只需几行,让回答者了解您的数据的基本结构。还有您在终端上看到的 错误。 我发现,我收到此错误是因为我正在整理id and then dateTime
上的数据,我这样做了,它先对 id 上的数据进行排序,然后再对 datetime 上的数据进行排序,这就是它导致的原因问题,但我尝试仅按日期时间排序,但没有收到该错误。感谢您的帮助
【参考方案1】:
我刚刚复制了您的示例,创建了一个新的 CSV (data.csv) 文件来尝试。最后我创建了新的 DataFrame (df) 并执行了你的语句,它起作用了。如果您遇到任何问题,请查看并发表评论。
注意:请检查您的pandas
版本,我的是0.23.4
(使用Python 3.6.5),this 是文档链接。
数据.csv
Datetime,Speed,distance,IDs,totalHours
2011-01-01 00:19:00,0.041916,0.000710,19,0.016944
2011-01-01 00:20:00,0.033719,0.000562,19,0.016667
2011-01-01 00:20:59,0.153553,0.002517,19,0.016389
2011-01-01 00:21:59,0.142272,0.002371,19,0.016667
2011-01-01 00:23:00,0.033166,0.000562,19,0.016944
2011-01-01 00:24:00,0.037843,0.000631,19,0.016667
2011-01-01 00:26:00,0.050262,0.001675,19,0.033333
2011-01-01 00:27:00,0.032249,0.000537,19,0.016667
2011-01-01 00:27:59,0.180206,0.002953,19,0.016389
2011-01-01 00:29:00,0.133477,0.002262,19,0.016944
2011-01-01 00:30:00,0.128053,0.002134,19,0.016667
2011-01-01 00:30:59,0.041964,0.000688,19,0.016389
2011-01-01 00:32:00,0.072529,0.001229,19,0.016944
2011-01-01 00:33:00,0.052437,0.000874,19,0.016667
2011-01-01 00:33:59,0.033903,0.000556,19,0.016389
2011-01-01 00:35:00,0.060076,0.001018,19,0.016944
2011-01-01 00:36:00,0.121709,0.002028,19,0.016667
2011-01-01 00:36:59,0.090517,0.001483,19,0.016389
2011-01-01 00:37:59,0.088304,0.001472,19,0.016667
2011-01-01 00:39:00,0.100654,0.001706,19,0.016944
2011-01-01 00:40:00,0.034839,0.000581,19,0.016667
2011-01-01 00:40:59,0.164753,0.002700,19,0.016389
2011-01-01 00:42:00,0.214163,0.003629,19,0.016944
2011-01-01 00:43:00,0.283706,0.004728,19,0.016667
2011-01-01 00:45:00,0.055676,0.001856,19,0.033333
2011-01-01 00:46:00,0.138059,0.002301,19,0.016667
2011-01-01 00:46:59,0.339829,0.005569,19,0.016389
2011-01-01 00:48:00,0.169921,0.002879,19,0.016944
2011-01-01 00:49:00,0.072382,0.001206,19,0.016667
2011-01-01 00:49:59,0.029009,0.000475,19,0.016389
在 Python 的交互式终端上执行的语句
>>> import pandas as pd
>>>
>>> df = pd.read_csv("data.csv")
>>> df
Datetime Speed distance IDs totalHours
0 2011-01-01 00:19:00 0.041916 0.000710 19 0.016944
1 2011-01-01 00:20:00 0.033719 0.000562 19 0.016667
2 2011-01-01 00:20:59 0.153553 0.002517 19 0.016389
3 2011-01-01 00:21:59 0.142272 0.002371 19 0.016667
4 2011-01-01 00:23:00 0.033166 0.000562 19 0.016944
5 2011-01-01 00:24:00 0.037843 0.000631 19 0.016667
6 2011-01-01 00:26:00 0.050262 0.001675 19 0.033333
7 2011-01-01 00:27:00 0.032249 0.000537 19 0.016667
8 2011-01-01 00:27:59 0.180206 0.002953 19 0.016389
9 2011-01-01 00:29:00 0.133477 0.002262 19 0.016944
10 2011-01-01 00:30:00 0.128053 0.002134 19 0.016667
11 2011-01-01 00:30:59 0.041964 0.000688 19 0.016389
12 2011-01-01 00:32:00 0.072529 0.001229 19 0.016944
13 2011-01-01 00:33:00 0.052437 0.000874 19 0.016667
14 2011-01-01 00:33:59 0.033903 0.000556 19 0.016389
15 2011-01-01 00:35:00 0.060076 0.001018 19 0.016944
16 2011-01-01 00:36:00 0.121709 0.002028 19 0.016667
17 2011-01-01 00:36:59 0.090517 0.001483 19 0.016389
18 2011-01-01 00:37:59 0.088304 0.001472 19 0.016667
19 2011-01-01 00:39:00 0.100654 0.001706 19 0.016944
20 2011-01-01 00:40:00 0.034839 0.000581 19 0.016667
21 2011-01-01 00:40:59 0.164753 0.002700 19 0.016389
22 2011-01-01 00:42:00 0.214163 0.003629 19 0.016944
23 2011-01-01 00:43:00 0.283706 0.004728 19 0.016667
24 2011-01-01 00:45:00 0.055676 0.001856 19 0.033333
25 2011-01-01 00:46:00 0.138059 0.002301 19 0.016667
26 2011-01-01 00:46:59 0.339829 0.005569 19 0.016389
27 2011-01-01 00:48:00 0.169921 0.002879 19 0.016944
28 2011-01-01 00:49:00 0.072382 0.001206 19 0.016667
29 2011-01-01 00:49:59 0.029009 0.000475 19 0.016389
>>>
>>> df.index = pd.to_datetime(df.Datetime)
>>> df
Datetime Speed distance IDs totalHours
Datetime
2011-01-01 00:19:00 2011-01-01 00:19:00 0.041916 0.000710 19 0.016944
2011-01-01 00:20:00 2011-01-01 00:20:00 0.033719 0.000562 19 0.016667
2011-01-01 00:20:59 2011-01-01 00:20:59 0.153553 0.002517 19 0.016389
2011-01-01 00:21:59 2011-01-01 00:21:59 0.142272 0.002371 19 0.016667
2011-01-01 00:23:00 2011-01-01 00:23:00 0.033166 0.000562 19 0.016944
2011-01-01 00:24:00 2011-01-01 00:24:00 0.037843 0.000631 19 0.016667
2011-01-01 00:26:00 2011-01-01 00:26:00 0.050262 0.001675 19 0.033333
2011-01-01 00:27:00 2011-01-01 00:27:00 0.032249 0.000537 19 0.016667
2011-01-01 00:27:59 2011-01-01 00:27:59 0.180206 0.002953 19 0.016389
2011-01-01 00:29:00 2011-01-01 00:29:00 0.133477 0.002262 19 0.016944
2011-01-01 00:30:00 2011-01-01 00:30:00 0.128053 0.002134 19 0.016667
2011-01-01 00:30:59 2011-01-01 00:30:59 0.041964 0.000688 19 0.016389
2011-01-01 00:32:00 2011-01-01 00:32:00 0.072529 0.001229 19 0.016944
2011-01-01 00:33:00 2011-01-01 00:33:00 0.052437 0.000874 19 0.016667
2011-01-01 00:33:59 2011-01-01 00:33:59 0.033903 0.000556 19 0.016389
2011-01-01 00:35:00 2011-01-01 00:35:00 0.060076 0.001018 19 0.016944
2011-01-01 00:36:00 2011-01-01 00:36:00 0.121709 0.002028 19 0.016667
2011-01-01 00:36:59 2011-01-01 00:36:59 0.090517 0.001483 19 0.016389
2011-01-01 00:37:59 2011-01-01 00:37:59 0.088304 0.001472 19 0.016667
2011-01-01 00:39:00 2011-01-01 00:39:00 0.100654 0.001706 19 0.016944
2011-01-01 00:40:00 2011-01-01 00:40:00 0.034839 0.000581 19 0.016667
2011-01-01 00:40:59 2011-01-01 00:40:59 0.164753 0.002700 19 0.016389
2011-01-01 00:42:00 2011-01-01 00:42:00 0.214163 0.003629 19 0.016944
2011-01-01 00:43:00 2011-01-01 00:43:00 0.283706 0.004728 19 0.016667
2011-01-01 00:45:00 2011-01-01 00:45:00 0.055676 0.001856 19 0.033333
2011-01-01 00:46:00 2011-01-01 00:46:00 0.138059 0.002301 19 0.016667
2011-01-01 00:46:59 2011-01-01 00:46:59 0.339829 0.005569 19 0.016389
2011-01-01 00:48:00 2011-01-01 00:48:00 0.169921 0.002879 19 0.016944
2011-01-01 00:49:00 2011-01-01 00:49:00 0.072382 0.001206 19 0.016667
2011-01-01 00:49:59 2011-01-01 00:49:59 0.029009 0.000475 19 0.016389
>>>
>>>
>>> df.Speed.rolling('60T', closed='right').sum()
Datetime
2011-01-01 00:19:00 0.041916
2011-01-01 00:20:00 0.075635
2011-01-01 00:20:59 0.229188
2011-01-01 00:21:59 0.371460
2011-01-01 00:23:00 0.404626
2011-01-01 00:24:00 0.442469
2011-01-01 00:26:00 0.492731
2011-01-01 00:27:00 0.524980
2011-01-01 00:27:59 0.705186
2011-01-01 00:29:00 0.838663
2011-01-01 00:30:00 0.966716
2011-01-01 00:30:59 1.008680
2011-01-01 00:32:00 1.081209
2011-01-01 00:33:00 1.133646
2011-01-01 00:33:59 1.167549
2011-01-01 00:35:00 1.227625
2011-01-01 00:36:00 1.349334
2011-01-01 00:36:59 1.439851
2011-01-01 00:37:59 1.528155
2011-01-01 00:39:00 1.628809
2011-01-01 00:40:00 1.663648
2011-01-01 00:40:59 1.828401
2011-01-01 00:42:00 2.042564
2011-01-01 00:43:00 2.326270
2011-01-01 00:45:00 2.381946
2011-01-01 00:46:00 2.520005
2011-01-01 00:46:59 2.859834
2011-01-01 00:48:00 3.029755
2011-01-01 00:49:00 3.102137
2011-01-01 00:49:59 3.131146
Name: Speed, dtype: float64
>>>
【讨论】:
以上是关于ValueError:索引必须是单调的的主要内容,如果未能解决你的问题,请参考以下文章
ValueError: index 必须是单调递增或递减,同时包含index,column,ffill
从变量中的值构造 pandas DataFrame 会给出“ValueError:如果使用所有标量值,则必须传递一个索引”
出现错误:“ValueError:如果使用所有标量值,则必须传递索引”将 ndarray 转换为 pandas Dataframe
ValueError 在 Scikit 中找到最佳超参数时使用 GridSearchCV 学习 LogisticRegression
weights = 'noisy-student' ValueError: `weights` 参数应该是 `None`、`imagenet` 或要加载的权重文件的路径