Python中使用DataFrame的线性回归预测模型

Posted 2023-03-29

技术标签:

【中文标题】Python中使用DataFrame的线性回归预测模型【英文标题】：Linear regression prediction model in Python using DataFrame 【发布时间】：2021-07-22 11:09:41 【问题描述】：

这是我的样本数据：

import pandas as pd

avg_consumption = pd.DataFrame(
'Car.Year.Model':[2009, 2010, 2011, 2012],
'City.mpg':[17.9, 17, 16.9, 18.3],
'Highway.mpg':[24.3, 23.6, 23.6, 25.7]
)

我想使用线性回归来预测每个车型年份每种燃油范围类型（城市和高速公路）的平均油耗。

我想要的输出是我的同一个 DataFrame，但它使用我现有的数据预测了截至 2025 年的汽车型号的平均油耗。我不完全确定该怎么做。

我尝试过的：

我尝试关注this question 的答案，因为问题似乎很相似：

from sklearn.linear_model import LinearRegression

years = pd.DataFrame()
years['Car.Year.Model'] = range(2009, 2025)
# I include 2009-2012 to test the prediction values are still the same as the original

X = avg_consumption.filter(['Car.Year.Model'])
y = avg_consumption.drop('Car.Year.Model', axis=1)

model = LinearRegression()
model.fit(X, y)

X_predict = years
y_predict = model.predict(X_predict)

我的结果如下：

如果我假设我的第一行有 2009 年的预测值，这是不正确的，因为我的原始 DataFrame 中 2009 年模型的值不同。

我想确保它能够正确预测到 2025 年为止的每一年的平均油耗。我还希望我的结果显示在与我的示例数据类似的 DataFrame 中。

有人能指出我正确的方向吗？

【问题讨论】：

“这是不正确的，因为我在 2009 年模型的原始数据帧中的值不同。”：那是因为您的（输入）值是实际数据，但是这个数据帧有来自最佳的预测 -拟合模型。输出不是你的数据：它基本上是一条穿过一些散点的线。 【参考方案1】：

您可以使用numpy.polyfit 和numpy.poly1d 进行线性外推。然后像这样添加预计的年份：

import pandas as pd
import numpy as np

avg_consumption = pd.DataFrame(
'Car.Year.Model':[2009, 2010, 2011, 2012],
'City.mpg':[17.9, 17, 16.9, 18.3],
'Highway.mpg':[24.3, 23.6, 23.6, 25.7]
)

f_city = np.poly1d(np.polyfit(avg_consumption["Car.Year.Model"], avg_consumption["City.mpg"], 1))
f_highway = np.poly1d(np.polyfit(avg_consumption["Car.Year.Model"], avg_consumption["Highway.mpg"], 1))
new_data = pd.DataFrame([[i, f_city(i), f_highway(i)] for i in range(2013, 2026)], columns=avg_consumption.columns)
avg_consumption = pd.concat([avg_consumption, new_data], axis=0)

产量：

    Car.Year.Model  City.mpg  Highway.mpg
0             2009     17.90        24.30
1             2010     17.00        23.60
2             2011     16.90        23.60
3             2012     18.30        25.70
0             2013     17.80        25.35
1             2014     17.91        25.77
2             2015     18.02        26.19
3             2016     18.13        26.61
4             2017     18.24        27.03
5             2018     18.35        27.45
6             2019     18.46        27.87
7             2020     18.57        28.29
8             2021     18.68        28.71
9             2022     18.79        29.13
10            2023     18.90        29.55
11            2024     19.01        29.97
12            2025     19.12        30.39

ax = avg_consumption.set_index("Car.Year.Model").iloc[:4].plot()
avg_consumption.set_index("Car.Year.Model").iloc[3:].plot(ls="-.", ax=ax)

【讨论】：

以上是关于Python中使用DataFrame的线性回归预测模型的主要内容，如果未能解决你的问题，请参考以下文章