基于UCI数据集Condition Based Maintenance of Naval Propulsion Plants的随机森林回归
Posted 钢珠子
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了基于UCI数据集Condition Based Maintenance of Naval Propulsion Plants的随机森林回归相关的知识,希望对你有一定的参考价值。
数据集参考文献:[1] A. Coraddu, L. Oneto, A. Ghio, S. Savio, D. Anguita, M. Figari, Machine Learning Approaches for Improving Condition?Based Maintenance of Naval Propulsion Plants, Journal of Engineering for the Maritime Environment, 2014, DOI: 10.1177/1475090214540874.
具体网站参见:http://archive.ics.uci.edu/ml/datasets/condition+based+maintenance+of+naval+propulsion+plants
数据集信息说明:
The experiments have been carried out by means of a numerical simulator of a naval vessel (Frigate) characterized by a Gas Turbine (GT) propulsion plant. The different blocks forming the complete simulator (Propeller, Hull, GT, Gear Box and Controller) have been developed and fine tuned over the year on several similar real propulsion plants. In view of these observations the available data are in agreement with a possible real vessel.
In this release of the simulator it is also possible to take into account the performance decay over time of the GT components such as GT compressor and turbines.
The propulsion system behaviour has been described with this parameters:
- Ship speed (linear function of the lever position lp).
- Compressor degradation coefficient kMc.
- Turbine degradation coefficient kMt.
so that each possible degradation state can be described by a combination of this triple (lp,kMt,kMc).
The range of decay of compressor and turbine has been sampled with an uniform grid of precision 0.001 so to have a good granularity of representation.
In particular for the compressor decay state discretization the kMc coefficient has been investigated in the domain [1; 0.95], and the turbine coefficient in the domain [1; 0.975].
Ship speed has been investigated sampling the range of feasible speed from 3 knots to 27 knots with a granularity of representation equal to tree knots.
A series of measures (16 features) which indirectly represents of the state of the system subject to performance decay has been acquired and stored in the dataset over the parameter‘s space.
Attribute Information:
- A 16-feature vector containing the GT measures at steady state of the physical asset:
Lever position (lp) [ ]
Ship speed (v) [knots]
Gas Turbine (GT) shaft torque (GTT) [kN m]
GT rate of revolutions (GTn) [rpm]
Gas Generator rate of revolutions (GGn) [rpm]
Starboard Propeller Torque (Ts) [kN]
Port Propeller Torque (Tp) [kN]
Hight Pressure (HP) Turbine exit temperature (T48) [C]
GT Compressor inlet air temperature (T1) [C]
GT Compressor outlet air temperature (T2) [C]
HP Turbine exit pressure (P48) [bar]
GT Compressor inlet air pressure (P1) [bar]
GT Compressor outlet air pressure (P2) [bar]
GT exhaust gas pressure (Pexh) [bar]
Turbine Injecton Control (TIC) [%]
Fuel flow (mf) [kg/s]
- GT Compressor decay state coefficient
- GT Turbine decay state coefficient
使用python对数据集应用随机森林回归:
import os import pandas as pd import numpy as np from sklearn.ensemble import RandomForestRegressor from sklearn.ensemble import GradientBoostingRegressor from sklearn.cross_validation import train_test_split from sklearn.metrics import mean_squared_error from sklearn.grid_search import GridSearchCV df = pd.read_csv(‘data.txt‘, header= None,sep=‘ ‘) df.shape df.head(5) X_train,X_test,y_train,y_test = train_test_split(df.iloc[:,:16],df.iloc[:,16:18],test_size=0.33, random_state=42) rgr1 = RandomForestRegressor() rgr1.fit(X_train.iloc[:,:16],y_train.iloc[:,0]) rgr1_pre = rgr1.predict(X_test.iloc[:,:16]) mean_squared_error(y_test.iloc[:,1],rgr1_pre)
结果为:0.00042135854935931014
rgr2 = RandomForestRegressor() rgr2.fit(X_train.iloc[:,:16],y_train.iloc[:,1]) rgr2_pre = rgr2.predict(X_test.iloc[:,:16]) mean_squared_error(y_test.iloc[:,1],rgr2_pre)
结果为:8.9887552213572615e-07
以上是关于基于UCI数据集Condition Based Maintenance of Naval Propulsion Plants的随机森林回归的主要内容,如果未能解决你的问题,请参考以下文章
机器学习之分类问题实战(基于UCI Bank Marketing Dataset)
pandas使用组合条件判断数据列的内容筛选符合条件的数据行(selecting rows based on a condition in dataframe)
基于模型(Model-based)进行特征选择(feature selection)并可视化特征重要性(feature importance)