散点图绘制二手车年份距离与保值率（二手车价/新车价格）分析

Posted 2021-11-07 ZSYL

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了散点图绘制二手车年份距离与保值率（二手车价/新车价格）分析相关的知识，希望对你有一定的参考价值。

【散点图绘制】二手车年份、距离与保值率（二手车价/新车价格）分析

散点图绘制入门
二手车数据散点图
- 作业要求
- 观察结果

散点图绘制入门

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

athletes = pd.read_csv('new_athlete.csv').dropna()
athletes.head()

	Unnamed: 0	Name	Sex	Age	Height	Weight
0	0	A Dijiang	M	24.0	180.0	80.0
1	1	A Lamusi	M	23.0	170.0	60.0
4	4	Christine Jacoba Aaftink	F	21.0	185.0	82.0
5	10	Per Knut Aaland	M	31.0	188.0	75.0
6	18	John Aalberg	M	31.0	183.0	72.0

plt.figure(figsize=(15,5))
male_athletes = athletes[athletes['Sex'] == 'M']
male_heights = male_athletes['Height']
male_weights = male_athletes['Weight']
male_height_mean = male_heights.mean()
male_weight_mean = male_weights.mean()

female_athletes = athletes[athletes['Sex'] == 'F']
female_heights = female_athletes['Height']
female_weights = female_athletes['Weight']
female_height_mean = female_heights.mean()
female_weight_mean = female_weights.mean()

plt.scatter(male_heights,male_weights,s=male_athletes['Age'],marker='^')
plt.scatter(female_heights,female_weights,s=female_athletes['Age'],alpha=0.5)
plt.axvline(male_height_mean,linewidth=1,c='r')
plt.axhline(male_weight_mean,linewidth=1,c='y')

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(male_heights.values[:,np.newaxis],male_weights)
predict_male_weight = model.predict(male_heights.values[:,np.newaxis])
plt.plot(male_heights,predict_male_weight)
# male_heights.values.reshape(male_heights.shape[0],1)

二手车数据散点图

作业要求

把guazi_bj（北京）、guazi_gz（广州）、guazi_sh（上海）、guazi_sz（深圳）二手车的数据归类在一个DataFrame中。
新增车辆使用年份（use_year）与保值率（hedge_rate）两个字段。其中使用年份的计算是把当前的时间减去购买的时间，然后再转换成年；保值率的计算是将二手车的价格/新车的价格。
把二手车使用年份与保值率（二手车价/新车价格）绘制成散点图，观察他们的分布情况。
把二手车的行驶距离与保值率（二手车价/新车价格）绘制成散点图，观察他们的分布情况。

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from datetime import datetime

guazi_bj = pd.read_csv("guazi_bj.csv")
guazi_gz = pd.read_csv("guazi_gz.csv")
guazi_sh = pd.read_csv("guazi_sh.csv")
guazi_sz = pd.read_csv("guazi_sz.csv")

guazi = pd.concat([guazi_bj,guazi_gz,guazi_sh,guazi_sz],axis=0)
def get_use_year(value):
    if isinstance(value,str):
        datetime_value = datetime.strptime(value,"%Y-%m")
        now = datetime.now()
        yeardelay = (now - datetime_value).total_seconds()/60/60/24/30/12
        return yeardelay
    return np.NAN
guazi['use_year'] = guazi['buy_time'].apply(get_use_year)
guazi['hedge_rate'] = guazi['es_price'] / guazi['new_price']
guazi[['use_year','km','hedge_rate']].head()

	use_year	km	hedge_rate
0	4.890745	3.82	0.615385
1	3.537967	2.35	0.600000
2	7.174078	6.67	0.426829
3	6.493523	11.83	0.356295
4	5.649078	8.95	0.469314

plt.figure(figsize=(15,5))
plt.scatter(guazi['km'],guazi['hedge_rate'],s=guazi['km'])

plt.figure(figsize=(15,5))
plt.scatter(guazi['use_year'],guazi['hedge_rate'],s=guazi['km'])
plt.xlabel("use year")
plt.ylabel("hedge rate")

Text(0, 0.5, 'ledge rate')

guazi[(guazi['hedge_rate'] > 0.9) & (guazi['use_year'] > 3)][['new_price','es_price','use_year','km']].head()

	new_price	es_price	use_year	km
21	10.19	10.03	5.393523	7.21
24	5.30	5.19	5.315745	3.42
30	3.30	3.15	7.510190	11.26
34	10.10	9.70	4.046301	7.21
39	10.24	10.00	6.160190	6.86

guazi[(guazi['hedge_rate'] > 0.9) & (guazi['use_year'] > 6)][['new_price','es_price','use_year','use_year']].head()

	new_price	es_price	use_year	use_year
30	3.30	3.15	7.510190	7.510190
39	10.24	10.00	6.160190	6.160190
80	9.11	8.92	7.679634	7.679634
90	7.57	7.32	6.749078	6.749078
124	7.98	7.61	7.935190	7.935190

观察结果

通过以上分析，我们可以看到汽车的保值率是随着使用年份和行驶公里数的增加呈现线性下降的。
有一部分数据引起我们的注意，就是保值率大于0.9，并且使用年份和行驶公里数都比较大的数据，我们可以看出这类数据基本上可以算是异常数据了，因此以后在分析的时候就可以处理掉这部分数据了。

加油!

感谢!

努力!

以上是关于散点图绘制二手车年份距离与保值率（二手车价/新车价格）分析的主要内容，如果未能解决你的问题，请参考以下文章