ValueError:索引必须是单调的

Posted

技术标签:

【中文标题】ValueError:索引必须是单调的【英文标题】:ValueError: index must be monotonic 【发布时间】:2019-06-05 22:54:06 【问题描述】:
dfdata.Speed.rolling('60T', closed='right').sum()

我正在尝试在这一列上应用滚动总和,并整理出整个数据,但我仍然遇到同样的错误。谁能帮我解决它??数据中索引列中的第一个日期和时间列,第二个是普通列。这就是为什么它看起来有点重复。

DateTime            DateTime            Speed       distance    IDs totalHours          
2011-01-01 00:19:00 2011-01-01 00:19:00 0.041916    0.000710    19  0.016944
2011-01-01 00:20:00 2011-01-01 00:20:00 0.033719    0.000562    19  0.016667
2011-01-01 00:20:59 2011-01-01 00:20:59 0.153553    0.002517    19  0.016389
2011-01-01 00:21:59 2011-01-01 00:21:59 0.142272    0.002371    19  0.016667
2011-01-01 00:23:00 2011-01-01 00:23:00 0.033166    0.000562    19  0.016944
2011-01-01 00:24:00 2011-01-01 00:24:00 0.037843    0.000631    19  0.016667
2011-01-01 00:26:00 2011-01-01 00:26:00 0.050262    0.001675    19  0.033333
2011-01-01 00:27:00 2011-01-01 00:27:00 0.032249    0.000537    19  0.016667
2011-01-01 00:27:59 2011-01-01 00:27:59 0.180206    0.002953    19  0.016389
2011-01-01 00:29:00 2011-01-01 00:29:00 0.133477    0.002262    19  0.016944
2011-01-01 00:30:00 2011-01-01 00:30:00 0.128053    0.002134    19  0.016667
2011-01-01 00:30:59 2011-01-01 00:30:59 0.041964    0.000688    19  0.016389
2011-01-01 00:32:00 2011-01-01 00:32:00 0.072529    0.001229    19  0.016944
2011-01-01 00:33:00 2011-01-01 00:33:00 0.052437    0.000874    19  0.016667
2011-01-01 00:33:59 2011-01-01 00:33:59 0.033903    0.000556    19  0.016389
2011-01-01 00:35:00 2011-01-01 00:35:00 0.060076    0.001018    19  0.016944
2011-01-01 00:36:00 2011-01-01 00:36:00 0.121709    0.002028    19  0.016667
2011-01-01 00:36:59 2011-01-01 00:36:59 0.090517    0.001483    19  0.016389
2011-01-01 00:37:59 2011-01-01 00:37:59 0.088304    0.001472    19  0.016667
2011-01-01 00:39:00 2011-01-01 00:39:00 0.100654    0.001706    19  0.016944
2011-01-01 00:40:00 2011-01-01 00:40:00 0.034839    0.000581    19  0.016667
2011-01-01 00:40:59 2011-01-01 00:40:59 0.164753    0.002700    19  0.016389
2011-01-01 00:42:00 2011-01-01 00:42:00 0.214163    0.003629    19  0.016944
2011-01-01 00:43:00 2011-01-01 00:43:00 0.283706    0.004728    19  0.016667
2011-01-01 00:45:00 2011-01-01 00:45:00 0.055676    0.001856    19  0.033333
2011-01-01 00:46:00 2011-01-01 00:46:00 0.138059    0.002301    19  0.016667
2011-01-01 00:46:59 2011-01-01 00:46:59 0.339829    0.005569    19  0.016389
2011-01-01 00:48:00 2011-01-01 00:48:00 0.169921    0.002879    19  0.016944
2011-01-01 00:49:00 2011-01-01 00:49:00 0.072382    0.001206    19  0.016667
2011-01-01 00:49:59 2011-01-01 00:49:59 0.029009    0.000475    19  0.016389

这是示例数据。

这是我得到的错误。

--------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-59-3224ac27b0b8> in <module>()
      1 # dfdata.Speed.rolling('60T', closed='right').sum()
----> 2 dfdata.Speed.rolling('60T', closed='right').sum()

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in rolling(self, window, min_periods, freq, center, win_type, on, axis, closed)
   6193                                    min_periods=min_periods, freq=freq,
   6194                                    center=center, win_type=win_type,
-> 6195                                    on=on, axis=axis, closed=closed)
   6196 
   6197         cls.rolling = rolling

~/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in rolling(obj, win_type, **kwds)
   2050         return Window(obj, win_type=win_type, **kwds)
   2051 
-> 2052     return Rolling(obj, **kwds)
   2053 
   2054 

~/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in __init__(self, obj, window, min_periods, freq, center, win_type, axis, on, closed, **kwargs)
     84         self.win_freq = None
     85         self.axis = obj._get_axis_number(axis) if axis is not None else None
---> 86         self.validate()
     87 
     88     @property

~/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in validate(self)
   1085                                          timedelta))):
   1086 
-> 1087             self._validate_monotonic()
   1088             freq = self._validate_freq()
   1089 

~/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in _validate_monotonic(self)
   1117             formatted = self.on or 'index'
   1118             raise ValueError("0 must be "
-> 1119                              "monotonic".format(formatted))
   1120 
   1121     def _validate_freq(self):

ValueError: index must be monotonic

【问题讨论】:

请务必在提问时添加足够的信息。没有输入数据集,没有指定错误等。其他人会知道并想解决吗?请说明所有这些。 数据是百万条记录,不知道是哪里出了问题,我应该提供哪一部分数据给你??不确定 但我会编辑问题并尝试进一步解释 只需几行,让回答者了解您的数据的基本结构。还有您在终端上看到的 错误 我发现,我收到此错误是因为我正在整理id and then dateTime 上的数据,我这样做了,它先对 id 上的数据进行排序,然后再对 datetime 上的数据进行排序,这就是它导致的原因问题,但我尝试仅按日期时间排序,但没有收到该错误。感谢您的帮助 【参考方案1】:

我刚刚复制了您的示例,创建了一个新的 CSV (data.csv) 文件来尝试。最后我创建了新的 DataFrame (df) 并执行了你的语句,它起作用了。如果您遇到任何问题,请查看并发表评论。

注意:请检查您的pandas 版本,我的是0.23.4(使用Python 3.6.5),this 是文档链接。

数据.csv

Datetime,Speed,distance,IDs,totalHours
2011-01-01 00:19:00,0.041916,0.000710,19,0.016944
2011-01-01 00:20:00,0.033719,0.000562,19,0.016667
2011-01-01 00:20:59,0.153553,0.002517,19,0.016389
2011-01-01 00:21:59,0.142272,0.002371,19,0.016667
2011-01-01 00:23:00,0.033166,0.000562,19,0.016944
2011-01-01 00:24:00,0.037843,0.000631,19,0.016667
2011-01-01 00:26:00,0.050262,0.001675,19,0.033333
2011-01-01 00:27:00,0.032249,0.000537,19,0.016667
2011-01-01 00:27:59,0.180206,0.002953,19,0.016389
2011-01-01 00:29:00,0.133477,0.002262,19,0.016944
2011-01-01 00:30:00,0.128053,0.002134,19,0.016667
2011-01-01 00:30:59,0.041964,0.000688,19,0.016389
2011-01-01 00:32:00,0.072529,0.001229,19,0.016944
2011-01-01 00:33:00,0.052437,0.000874,19,0.016667
2011-01-01 00:33:59,0.033903,0.000556,19,0.016389
2011-01-01 00:35:00,0.060076,0.001018,19,0.016944
2011-01-01 00:36:00,0.121709,0.002028,19,0.016667
2011-01-01 00:36:59,0.090517,0.001483,19,0.016389
2011-01-01 00:37:59,0.088304,0.001472,19,0.016667
2011-01-01 00:39:00,0.100654,0.001706,19,0.016944
2011-01-01 00:40:00,0.034839,0.000581,19,0.016667
2011-01-01 00:40:59,0.164753,0.002700,19,0.016389
2011-01-01 00:42:00,0.214163,0.003629,19,0.016944
2011-01-01 00:43:00,0.283706,0.004728,19,0.016667
2011-01-01 00:45:00,0.055676,0.001856,19,0.033333
2011-01-01 00:46:00,0.138059,0.002301,19,0.016667
2011-01-01 00:46:59,0.339829,0.005569,19,0.016389
2011-01-01 00:48:00,0.169921,0.002879,19,0.016944
2011-01-01 00:49:00,0.072382,0.001206,19,0.016667
2011-01-01 00:49:59,0.029009,0.000475,19,0.016389

在 Python 的交互式终端上执行的语句

>>> import pandas as pd
>>>
>>> df = pd.read_csv("data.csv")
>>> df
               Datetime     Speed  distance  IDs  totalHours
0   2011-01-01 00:19:00  0.041916  0.000710   19    0.016944
1   2011-01-01 00:20:00  0.033719  0.000562   19    0.016667
2   2011-01-01 00:20:59  0.153553  0.002517   19    0.016389
3   2011-01-01 00:21:59  0.142272  0.002371   19    0.016667
4   2011-01-01 00:23:00  0.033166  0.000562   19    0.016944
5   2011-01-01 00:24:00  0.037843  0.000631   19    0.016667
6   2011-01-01 00:26:00  0.050262  0.001675   19    0.033333
7   2011-01-01 00:27:00  0.032249  0.000537   19    0.016667
8   2011-01-01 00:27:59  0.180206  0.002953   19    0.016389
9   2011-01-01 00:29:00  0.133477  0.002262   19    0.016944
10  2011-01-01 00:30:00  0.128053  0.002134   19    0.016667
11  2011-01-01 00:30:59  0.041964  0.000688   19    0.016389
12  2011-01-01 00:32:00  0.072529  0.001229   19    0.016944
13  2011-01-01 00:33:00  0.052437  0.000874   19    0.016667
14  2011-01-01 00:33:59  0.033903  0.000556   19    0.016389
15  2011-01-01 00:35:00  0.060076  0.001018   19    0.016944
16  2011-01-01 00:36:00  0.121709  0.002028   19    0.016667
17  2011-01-01 00:36:59  0.090517  0.001483   19    0.016389
18  2011-01-01 00:37:59  0.088304  0.001472   19    0.016667
19  2011-01-01 00:39:00  0.100654  0.001706   19    0.016944
20  2011-01-01 00:40:00  0.034839  0.000581   19    0.016667
21  2011-01-01 00:40:59  0.164753  0.002700   19    0.016389
22  2011-01-01 00:42:00  0.214163  0.003629   19    0.016944
23  2011-01-01 00:43:00  0.283706  0.004728   19    0.016667
24  2011-01-01 00:45:00  0.055676  0.001856   19    0.033333
25  2011-01-01 00:46:00  0.138059  0.002301   19    0.016667
26  2011-01-01 00:46:59  0.339829  0.005569   19    0.016389
27  2011-01-01 00:48:00  0.169921  0.002879   19    0.016944
28  2011-01-01 00:49:00  0.072382  0.001206   19    0.016667
29  2011-01-01 00:49:59  0.029009  0.000475   19    0.016389
>>>
>>> df.index = pd.to_datetime(df.Datetime)
>>> df
                                Datetime     Speed  distance  IDs  totalHours
Datetime
2011-01-01 00:19:00  2011-01-01 00:19:00  0.041916  0.000710   19    0.016944
2011-01-01 00:20:00  2011-01-01 00:20:00  0.033719  0.000562   19    0.016667
2011-01-01 00:20:59  2011-01-01 00:20:59  0.153553  0.002517   19    0.016389
2011-01-01 00:21:59  2011-01-01 00:21:59  0.142272  0.002371   19    0.016667
2011-01-01 00:23:00  2011-01-01 00:23:00  0.033166  0.000562   19    0.016944
2011-01-01 00:24:00  2011-01-01 00:24:00  0.037843  0.000631   19    0.016667
2011-01-01 00:26:00  2011-01-01 00:26:00  0.050262  0.001675   19    0.033333
2011-01-01 00:27:00  2011-01-01 00:27:00  0.032249  0.000537   19    0.016667
2011-01-01 00:27:59  2011-01-01 00:27:59  0.180206  0.002953   19    0.016389
2011-01-01 00:29:00  2011-01-01 00:29:00  0.133477  0.002262   19    0.016944
2011-01-01 00:30:00  2011-01-01 00:30:00  0.128053  0.002134   19    0.016667
2011-01-01 00:30:59  2011-01-01 00:30:59  0.041964  0.000688   19    0.016389
2011-01-01 00:32:00  2011-01-01 00:32:00  0.072529  0.001229   19    0.016944
2011-01-01 00:33:00  2011-01-01 00:33:00  0.052437  0.000874   19    0.016667
2011-01-01 00:33:59  2011-01-01 00:33:59  0.033903  0.000556   19    0.016389
2011-01-01 00:35:00  2011-01-01 00:35:00  0.060076  0.001018   19    0.016944
2011-01-01 00:36:00  2011-01-01 00:36:00  0.121709  0.002028   19    0.016667
2011-01-01 00:36:59  2011-01-01 00:36:59  0.090517  0.001483   19    0.016389
2011-01-01 00:37:59  2011-01-01 00:37:59  0.088304  0.001472   19    0.016667
2011-01-01 00:39:00  2011-01-01 00:39:00  0.100654  0.001706   19    0.016944
2011-01-01 00:40:00  2011-01-01 00:40:00  0.034839  0.000581   19    0.016667
2011-01-01 00:40:59  2011-01-01 00:40:59  0.164753  0.002700   19    0.016389
2011-01-01 00:42:00  2011-01-01 00:42:00  0.214163  0.003629   19    0.016944
2011-01-01 00:43:00  2011-01-01 00:43:00  0.283706  0.004728   19    0.016667
2011-01-01 00:45:00  2011-01-01 00:45:00  0.055676  0.001856   19    0.033333
2011-01-01 00:46:00  2011-01-01 00:46:00  0.138059  0.002301   19    0.016667
2011-01-01 00:46:59  2011-01-01 00:46:59  0.339829  0.005569   19    0.016389
2011-01-01 00:48:00  2011-01-01 00:48:00  0.169921  0.002879   19    0.016944
2011-01-01 00:49:00  2011-01-01 00:49:00  0.072382  0.001206   19    0.016667
2011-01-01 00:49:59  2011-01-01 00:49:59  0.029009  0.000475   19    0.016389
>>>
>>>
>>> df.Speed.rolling('60T', closed='right').sum()
Datetime
2011-01-01 00:19:00    0.041916
2011-01-01 00:20:00    0.075635
2011-01-01 00:20:59    0.229188
2011-01-01 00:21:59    0.371460
2011-01-01 00:23:00    0.404626
2011-01-01 00:24:00    0.442469
2011-01-01 00:26:00    0.492731
2011-01-01 00:27:00    0.524980
2011-01-01 00:27:59    0.705186
2011-01-01 00:29:00    0.838663
2011-01-01 00:30:00    0.966716
2011-01-01 00:30:59    1.008680
2011-01-01 00:32:00    1.081209
2011-01-01 00:33:00    1.133646
2011-01-01 00:33:59    1.167549
2011-01-01 00:35:00    1.227625
2011-01-01 00:36:00    1.349334
2011-01-01 00:36:59    1.439851
2011-01-01 00:37:59    1.528155
2011-01-01 00:39:00    1.628809
2011-01-01 00:40:00    1.663648
2011-01-01 00:40:59    1.828401
2011-01-01 00:42:00    2.042564
2011-01-01 00:43:00    2.326270
2011-01-01 00:45:00    2.381946
2011-01-01 00:46:00    2.520005
2011-01-01 00:46:59    2.859834
2011-01-01 00:48:00    3.029755
2011-01-01 00:49:00    3.102137
2011-01-01 00:49:59    3.131146
Name: Speed, dtype: float64
>>>

【讨论】:

以上是关于ValueError:索引必须是单调的的主要内容,如果未能解决你的问题,请参考以下文章

ValueError: index 必须是单调递增或递减,同时包含index,column,ffill

从变量中的值构造 pandas DataFrame 会给出“ValueError:如果使用所有标量值,则必须传递一个索引”

出现错误:“ValueError:如果使用所有标量值,则必须传递索引”将 ndarray 转换为 pandas Dataframe

ValueError 在 Scikit 中找到最佳超参数时使用 GridSearchCV 学习 LogisticRegression

weights = 'noisy-student' ValueError: `weights` 参数应该是 `None`、`imagenet` 或要加载的权重文件的路径

Python中的ValueError,数组中的索引数不匹配