为啥 pandas 在进行季节性分解时会要求 freq 或 x?

Posted

技术标签:

【中文标题】为啥 pandas 在进行季节性分解时会要求 freq 或 x?【英文标题】:Why does pandas ask for freq or x when doing seasonal decomposition?为什么 pandas 在进行季节性分解时会要求 freq 或 x? 【发布时间】:2022-01-16 06:43:41 【问题描述】:

我跑了

decompose_result = seasonal_decompose(df["TMAX"],model="additive")
decompose_result.plot();

但结果是

"你必须指定一个频率或者 x 必须是一个带有 频率未设置为 None 的时间序列索引"

数据仅包含日期和 tmax(最高温度)。

【问题讨论】:

最有可能的是,df["TMAX"] 不是“具有时间序列索引且频率未设置为 None 的 pandas 对象”。在文本中提供一些日期子集。 日期子集??你能解释一下吗 对不起,'数据子集' 您能否提供数据集的摘录?它看起来像df df['TMAX'] 【参考方案1】:

我假设您忘记引入 句号 并将其传递给 freq 在这一行中的 seasonal_decompose() 参数:

decompose_result = seasonal_decompose(df["TMAX"],model="additive")

它抛出了以下ValueError

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-9b030cf1055e> in <module>()
----> 1 decompose_result = seasonal_decompose(df["TMAX"],model="additive")
      2 decompose_result.plot()

/usr/local/lib/python3.7/dist-packages/statsmodels/tsa/seasonal.py in seasonal_decompose(x, model, filt, freq, two_sided, extrapolate_trend)
    125             freq = pfreq
    126         else:
--> 127             raise ValueError("You must specify a freq or x must be a "
    128                              "pandas object with a timeseries index with "
    129                              "a freq not set to None")

ValueError: You must specify a freq or x must be a pandas object with a time-series index with a freq not set to None

所以请遵循以下脚本:

# import libraries
import matplotlib.pyplot as plt
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose
 
# Generate time-series data
total_duration = 100
step = 0.01
time = np.arange(0, total_duration, step)
 
# Period of the sinusoidal signal in seconds
T= 15
 
# Period component
series_periodic = np.sin((2*np.pi/T)*time)
 
# Add a trend component
k0 = 2
k1 = 2
k2 = 0.05
k3 = 0.001
 
series_periodic = k0*series_periodic
series_trend    = k1*np.ones(len(time))+k2*time+k3*time**2
series          = series_periodic+series_trend 

# Set frequency using period in seasonal_decompose()
period = int(T/step)
results = seasonal_decompose(series, model='additive', freq=period)

trend_estimate    = results.trend
periodic_estimate = results.seasonal
residual          = results.resid
 
# Plot the time-series components
plt.figure(figsize=(14,10))
plt.subplot(221)
plt.plot(series,label='Original time series', color='blue')
plt.legend(loc='best',fontsize=20 , bbox_to_anchor=(0.90, -0.05))
plt.subplot(222)
plt.plot(trend_estimate,label='Trend of time series',color='blue')
plt.legend(loc='best',fontsize=20, bbox_to_anchor=(0.90, -0.05))
plt.subplot(223)
plt.plot(periodic_estimate,label='Seasonality of time series',color='blue')
plt.legend(loc='best',fontsize=20, bbox_to_anchor=(0.90, -0.05))
plt.subplot(224)
plt.plot(residual,label='Decomposition residuals of time series',color='blue')
plt.legend(loc='best',fontsize=20, bbox_to_anchor=(1.09, -0.05))
plt.tight_layout()
plt.savefig('decomposition.png')

绘制时间序列分量:

如果您使用的是 pandas 数据框:

# import libraries
import pandas as pd
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose

# Generate some data
np.random.seed(0)
n = 1500

dates = np.array('2020-01-01', dtype=np.datetime64) + np.arange(n)
data = 12*np.sin(2*np.pi*np.arange(n)/365) + np.random.normal(12, 2, 1500)

#=================> Approach#1 <==================
# Set period after building dataframe
df = pd.DataFrame('TMAX': data, index=dates)

# Reproduce the OP's example  
seasonal_decompose(df['TMAX'], model='additive', freq=15).plot()

#=================> Approach#2 <==================
# create period once you create pandas dataframe by asfreq() after set dates as index
df = pd.DataFrame('TMAX': data,, index=dates).asfreq('D').dropna()

# Reproduce the example for OP
seasonal_decompose(df , model='additive').plot()

【讨论】:

以上是关于为啥 pandas 在进行季节性分解时会要求 freq 或 x?的主要内容,如果未能解决你的问题,请参考以下文章

为啥 Pandas 在 read_csv 时会导致 ZeroDivisionError

Python使用matplotlib可视化时间序列数据的分解图将时间序列数据进行分解并可视化分解为趋势季节和残差等成分(Time Series Decomposition Plot)

chapter15.1-2 时间序列1--时间序列分解

为啥使用 pandas read_sql 使用 bigquery 时会出现性能问题?

为啥用python的pandas读excel文件时会少掉一行数据

如何解释季节性分解图?