以广泛的数据格式循环 python auto_arima 通过几列

Posted 2023-03-12

技术标签:

【中文标题】以广泛的数据格式循环 python auto_arima 通过几列【英文标题】：Loop python auto_arima through several columns in a wide data format 【发布时间】：2019-12-14 17:36:43 【问题描述】：

首先我会说我绝不是 Python 专家，但我当前的项目要求它使用 Python 进行编程，因此非常感谢任何帮助。我所拥有的是一个转换后的时间序列，其中包含每月数据（30 个月）和 1000 多个项目。

我希望为这些列中的每一列运行 arima。它们不相互依赖。本质上，它就像运行 1000 次独立的 Arima 分析。

我通过为每个项目创建一个数据框列表并使用 R 的自动 arima 函数在列表中循环来在 R 中编程此功能。它既慢又笨重，但完成了工作。

在 Python 中执行此操作我没有找到创建此结构并使其可行的方法。相反，我找到了一些代码并试图从中创建一个循环。现在， auto_arima 在此运行，但它覆盖了结果，我真的不知道如何使它可行。

我需要运行 auto_arima，因为项目具有单独的最优 P、D、Q 参数。

X为数据，结构为：index, item1, item2, item3...itemn

dict_org = 
dict_pred = 

for col in X:
    size = int(len(X) * 0.70)
    train, testdata = X[0:size], X[size:len(X)]
    history = [x for x in train[column]]
    predictions = list()

    for column in testdata:
        model = pm.auto_arima(history, start_p=1, start_q=1,
                      test='adf',       # use adftest to find optimal 'd'
                      max_p=3, max_q=3, # maximum p and q
                      m=1,              # frequency of series
                      d=None,           # let model determine 'd'
                      seasonal=False,   # No Seasonality
                      start_P=0, 
                      D=0, 
                      trace=True,
                      error_action='ignore',  
                      suppress_warnings=True, 
                      stepwise=True) # this works 

        output = model.predict()

        yhat = output[0]
        predictions.append(yhat)
        obs = testdata[column]
        history.append(obs)
        print("Predicted:%f, expected:%f" %(yhat, obs))

        error = mean_squared_error(testdata, predictions[:len(testdata)])
    print('Test MSE: %.3f' % error)

    dict_org.update(X[col]: testdata)
    dict_pred.update(X[col]: predictions)

    print("Item: ", X[col], "Test MSE:%f"% error)

我想要得到的是一个包含所有项目和预测的字典，类似于我通过将 R 的自动 arima 传递给数据帧列表所得到的。我现在不断将 yhat 更新为 1 次观察，我不知所措。

非常感谢您的帮助。

【问题讨论】：

【参考方案1】：

您现在可能已经找到了解决方案，但我会留下答案，以防其他人偶然发现它。

auto_arima 不是模型本身。这是一个帮助找到最佳模型订单的功能。在上述情况下，您要做的是为其分配一个变量并访问订单和季节性订单，以及最佳模型的 AIC。您可以创建一个小函数来执行这部分，然后将输出用于实际模型。

def find_orders(ts):

    stepwise_model = pm.auto_arima(history, start_p=1, start_q=1,
                      test='adf',       # use adftest to find optimal 'd'
                      max_p=3, max_q=3, # maximum p and q
                      m=1,              # frequency of series
                      d=None,           # let model determine 'd'
                      seasonal=False,   # No Seasonality
                      start_P=0, 
                      D=0, 
                      trace=True,
                      error_action='ignore',  
                      suppress_warnings=True, 
                      stepwise=True) # this works 

    return stepwise_model.order, stepwise_model.seasonal_order

然后，您可以为建模部分创建另一个函数 - 假设您称之为 fit_arima - 并为循环中的每个时间序列传递模型中的顺序和季节性顺序。

for column in testdata:
        order, seasonal_order = find_orders(ts)
        fit_arimax(ts, order=order, seasonal_order=seasonal_order)

希望有帮助！

【讨论】：

以上是关于以广泛的数据格式循环 python auto_arima 通过几列的主要内容，如果未能解决你的问题，请参考以下文章