绘制数据帧时出现内存错误(matplotlib)
Posted
技术标签:
【中文标题】绘制数据帧时出现内存错误(matplotlib)【英文标题】:Memory error while plotting dataframe (matplotlib) 【发布时间】:2016-08-16 21:21:40 【问题描述】:我正在使用带有 Jupyter Notebook 的 Pandas 并尝试绘制一个小数据框:
当我插入以下行时:
df9.plot(x='Time', y='Pressure mean')
我收到以下错误:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-8-c789b8162a1a> in <module>()
----> 1 df9.plot(x='Time', y='Pressure mean')
C:\Anaconda3\lib\site-packages\pandas\tools\plotting.py in __call__(self, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)
3735 fontsize=fontsize, colormap=colormap, table=table,
3736 yerr=yerr, xerr=xerr, secondary_y=secondary_y,
-> 3737 sort_columns=sort_columns, **kwds)
3738 __call__.__doc__ = plot_frame.__doc__
3739
C:\Anaconda3\lib\site-packages\pandas\tools\plotting.py in plot_frame(data, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)
2609 yerr=yerr, xerr=xerr,
2610 secondary_y=secondary_y, sort_columns=sort_columns,
-> 2611 **kwds)
2612
2613
C:\Anaconda3\lib\site-packages\pandas\tools\plotting.py in _plot(data, x, y, subplots, ax, kind, **kwds)
2436 plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
2437
-> 2438 plot_obj.generate()
2439 plot_obj.draw()
2440 return plot_obj.result
C:\Anaconda3\lib\site-packages\pandas\tools\plotting.py in generate(self)
1029
1030 for ax in self.axes:
-> 1031 self._post_plot_logic_common(ax, self.data)
1032 self._post_plot_logic(ax, self.data)
1033
C:\Anaconda3\lib\site-packages\pandas\tools\plotting.py in _post_plot_logic_common(self, ax, data)
1157 ax.set_xticklabels(xticklabels)
1158 self._apply_axis_properties(ax.xaxis, rot=self.rot,
-> 1159 fontsize=self.fontsize)
1160 self._apply_axis_properties(ax.yaxis, fontsize=self.fontsize)
1161 elif self.orientation == 'horizontal':
C:\Anaconda3\lib\site-packages\pandas\tools\plotting.py in _apply_axis_properties(self, axis, rot, fontsize)
1205
1206 def _apply_axis_properties(self, axis, rot=None, fontsize=None):
-> 1207 labels = axis.get_majorticklabels() + axis.get_minorticklabels()
1208 for label in labels:
1209 if rot is not None:
C:\Anaconda3\lib\site-packages\matplotlib\axis.py in get_majorticklabels(self)
1159 def get_majorticklabels(self):
1160 'Return a list of Text instances for the major ticklabels'
-> 1161 ticks = self.get_major_ticks()
1162 labels1 = [tick.label1 for tick in ticks if tick.label1On]
1163 labels2 = [tick.label2 for tick in ticks if tick.label2On]
C:\Anaconda3\lib\site-packages\matplotlib\axis.py in get_major_ticks(self, numticks)
1288 'get the tick instances; grow as necessary'
1289 if numticks is None:
-> 1290 numticks = len(self.get_major_locator()())
1291 if len(self.majorTicks) < numticks:
1292 # update the new tick label properties from the old
C:\Anaconda3\lib\site-packages\pandas\tseries\converter.py in __call__(self)
876 vmin, vmax = vmax, vmin
877 if self.isdynamic:
--> 878 locs = self._get_default_locs(vmin, vmax)
879 else: # pragma: no cover
880 base = self.base
C:\Anaconda3\lib\site-packages\pandas\tseries\converter.py in _get_default_locs(self, vmin, vmax)
857
858 if self.plot_obj.date_axis_info is None:
--> 859 self.plot_obj.date_axis_info = self.finder(vmin, vmax, self.freq)
860
861 locator = self.plot_obj.date_axis_info
C:\Anaconda3\lib\site-packages\pandas\tseries\converter.py in _daily_finder(vmin, vmax, freq)
481 Period(ordinal=int(vmax), freq=freq))
482 span = vmax.ordinal - vmin.ordinal + 1
--> 483 dates_ = PeriodIndex(start=vmin, end=vmax, freq=freq)
484 # Initialize the output
485 info = np.zeros(span,
C:\Anaconda3\lib\site-packages\pandas\tseries\period.py in __new__(cls, data, ordinal, freq, start, end, periods, copy, name, tz, **kwargs)
186 else:
187 data, freq = cls._generate_range(start, end, periods,
--> 188 freq, kwargs)
189 else:
190 ordinal, freq = cls._from_arraylike(data, freq, tz)
C:\Anaconda3\lib\site-packages\pandas\tseries\period.py in _generate_range(cls, start, end, periods, freq, fields)
200 raise ValueError('Can either instantiate from fields '
201 'or endpoints, but not both')
--> 202 subarr, freq = _get_ordinal_range(start, end, periods, freq)
203 elif field_count > 0:
204 subarr, freq = _range_from_fields(freq=freq, **fields)
C:\Anaconda3\lib\site-packages\pandas\tseries\period.py in _get_ordinal_range(start, end, periods, freq, mult)
1026 dtype=np.int64)
1027 else:
-> 1028 data = np.arange(start.ordinal, end.ordinal + 1, mult, dtype=np.int64)
1029
1030 return data, freq
MemoryError:
有什么问题?我想不通。 谢谢!
【问题讨论】:
您的数据框中有多少行? 24,很小的 【参考方案1】:问题源于将 TimedeltaIndex(或 timedelta)用于您的时间列。那里有报道:https://github.com/pydata/pandas/issues/8711 我们还没有找到解决方案。
作为替代解决方案,我邀请您将数据转换为 DateTime 或 DateTimeIndex。假设YourDate
包含您观察的开始日期。
df9.index = pd.DatetimeIndex(pd.datetime.strptime(YourDate,'%d.%m.%Y %H:%M:%S')
+df9['Time'])
df9.plot(y='Pressure mean')
请注意,如果您的时间少于 24 小时,它只会绘制小时数。
编辑(2016-11-07): 我现在可以使用 timedelta 作为索引并正确绘制。这就是我进行的方式(假设我有指示小时数的浮点数):
converter = 'Time[h]' : lambda x: pd.to_timedelta(float(x),unit='h')#converts float to timedelta
df = pd.read_csv(fpath, sep='\t',
skiprows=len(comments),#header
names=dt.keys(),#you need of course your own dtype
dtype=dt,#you need of course your own dtype
encoding='latin-1',#European data...
skipinitialspace=True,
converters=converter)
df = df.set_index('Time[h]')#time column to index.
【讨论】:
也遇到过这个问题,数据框非常小(小到 3*12)。与上面类似,重置索引(为整数)允许数据绘制..(并将 xticklabels 设置为重新索引数据帧中的 Timedelta 列允许显示 timedelta 信息)。我猜只对小型数据帧真正相关/有用。 我可以使用时间增量作为最后一次熊猫更新的索引,并相应地更新我自己的包。我正在修改我对此事的回答。【参考方案2】:正如 Wli 所说,这是一个仍有待修复的错误。但作为一种解决方法,这对我有用。 -
plt.plot(s.index,s.values)
【讨论】:
以上是关于绘制数据帧时出现内存错误(matplotlib)的主要内容,如果未能解决你的问题,请参考以下文章