python 数据科学 - 回归分析 ☞ 线性回归
Posted PeersLee
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 数据科学 - 回归分析 ☞ 线性回归相关的知识,希望对你有一定的参考价值。
基本线性回归、多次线性回归、多元线性回归:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
'''
高次线性回归
'''
poly_reg = PolynomialFeatures(degree=2)
X_ = poly_reg.fit_transform(X)
regr = LinearRegression()
regr.fit(X_, Y)
X2 = X.sort_values(['year'])
X2_ = poly_reg.fit_transform(X2)
plt.scatter(X, Y, color='black')
# plt.plot(X, regr.predict(X_), color='blue', linewidth=3)
plt.plot(X2, regr.predict(X2_), color='blue', linewidth=3)
df = pd.read_csv('Data/house-prices.csv')
# dummy variable
house = pd.concat([df, pd.get_dummies(df['Brick']), pd.get_dummies(df['Neighborhood'])], axis=1)
# multicollinearity
del house['No']
del house['West']
del house['Brick']
del house['Neighborhood']
del house['Home']
house.head()
实例(房天下-上海-房地产数据):
import pandas as pd
df = pd.read_excel('Data/house_price_regression.xlsx')
df.head()
df['age'] = df['age'].map(lambda e : 2017 - int(e.strip().strip('建筑年代:')))
df[['room', 'l_room']] = df['layout'].str.extract('(\\d+)室(\\d+)厅')
df['total_floor'] = df['floor_info'].str.extract('共(\\d+)层')
df['floor'] = df['floor_info'].str.extract('^(.)层')
df['direction'] = df['direction'].map(lambda e : e.strip())
df = pd.concat([df, pd.get_dummies(df['direction']), pd.get_dummies(df['floor'])], axis=1)
df.head()
% pylab inline
df[['price', 'area']].plot(kind='scatter', x='area', y='price', figsize=[15, 5])
X = df[['area']]
Y = df['price']
from sklearn.linear_model import LinearRegression
regr = LinearRegression()
regr.fit(X, Y)
print('Coefficient:', regr.coef_)
print('Intercept:', regr.intercept_)
regr.predict(65)
plt.scatter(X, Y, color='blue')
plt.plot(X, regr.predict(X), linewidth=3, color='red')
plt.xlabel('area')
plt.ylabel('price')
X = df[['age', 'area', 'room', 'l_room', 'total_floor', '东南向', '东向','南北向', '南向', '西向', '中', '低']]
Y = df['price']
from sklearn.linear_model import LinearRegression
regr = LinearRegression()
regr.fit(X, Y)
regr.predict([19,65,2,1,6,0,0,1,0,0,1,0])
以上是关于python 数据科学 - 回归分析 ☞ 线性回归的主要内容,如果未能解决你的问题,请参考以下文章