基于UCI数据集Condition Based Maintenance of Naval Propulsion Plants的随机森林回归

Posted 钢珠子

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了基于UCI数据集Condition Based Maintenance of Naval Propulsion Plants的随机森林回归相关的知识,希望对你有一定的参考价值。

数据集参考文献:[1] A. Coraddu, L. Oneto, A. Ghio, S. Savio, D. Anguita, M. Figari, Machine Learning Approaches for Improving Condition?Based Maintenance of Naval Propulsion Plants, Journal of Engineering for the Maritime Environment, 2014, DOI: 10.1177/1475090214540874.

具体网站参见:http://archive.ics.uci.edu/ml/datasets/condition+based+maintenance+of+naval+propulsion+plants

数据集信息说明:

The experiments have been carried out by means of a numerical simulator of a naval vessel (Frigate) characterized by a Gas Turbine (GT) propulsion plant. The different blocks forming the complete simulator (Propeller, Hull, GT, Gear Box and Controller) have been developed and fine tuned over the year on several similar real propulsion plants. In view of these observations the available data are in agreement with a possible real vessel. 

In this release of the simulator it is also possible to take into account the performance decay over time of the GT components such as GT compressor and turbines. 
The propulsion system behaviour has been described with this parameters: 
- Ship speed (linear function of the lever position lp). 
- Compressor degradation coefficient kMc. 
- Turbine degradation coefficient kMt. 
so that each possible degradation state can be described by a combination of this triple (lp,kMt,kMc). 
The range of decay of compressor and turbine has been sampled with an uniform grid of precision 0.001 so to have a good granularity of representation. 
In particular for the compressor decay state discretization the kMc coefficient has been investigated in the domain [1; 0.95], and the turbine coefficient in the domain [1; 0.975]
Ship speed has been investigated sampling the range of feasible speed from 3 knots to 27 knots with a granularity of representation equal to tree knots. 
A series of measures (16 features) which indirectly represents of the state of the system subject to performance decay has been acquired and stored in the dataset over the parameter‘s space. 

Attribute Information:

- A 16-feature vector containing the GT measures at steady state of the physical asset: 
Lever position (lp) [ ] 
Ship speed (v) [knots] 
Gas Turbine (GT) shaft torque (GTT) [kN m] 
GT rate of revolutions (GTn) [rpm] 
Gas Generator rate of revolutions (GGn) [rpm] 
Starboard Propeller Torque (Ts) [kN] 
Port Propeller Torque (Tp) [kN] 
Hight Pressure (HP) Turbine exit temperature (T48) [C] 
GT Compressor inlet air temperature (T1) [C] 
GT Compressor outlet air temperature (T2) [C] 
HP Turbine exit pressure (P48) [bar] 
GT Compressor inlet air pressure (P1) [bar] 
GT Compressor outlet air pressure (P2) [bar] 
GT exhaust gas pressure (Pexh) [bar] 
Turbine Injecton Control (TIC) [%] 
Fuel flow (mf) [kg/s] 

- GT Compressor decay state coefficient 
- GT Turbine decay state coefficient

 

使用python对数据集应用随机森林回归:

import os
import pandas as pd
import numpy as np 
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor 
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error   
from sklearn.grid_search import GridSearchCV

df = pd.read_csv(data.txt, header= None,sep=   )
df.shape
df.head(5)

X_train,X_test,y_train,y_test = train_test_split(df.iloc[:,:16],df.iloc[:,16:18],test_size=0.33, random_state=42)

rgr1 = RandomForestRegressor()
rgr1.fit(X_train.iloc[:,:16],y_train.iloc[:,0])
rgr1_pre = rgr1.predict(X_test.iloc[:,:16])
mean_squared_error(y_test.iloc[:,1],rgr1_pre)

结果为:0.00042135854935931014

rgr2 = RandomForestRegressor()
rgr2.fit(X_train.iloc[:,:16],y_train.iloc[:,1])
rgr2_pre = rgr2.predict(X_test.iloc[:,:16])
mean_squared_error(y_test.iloc[:,1],rgr2_pre)

结果为:8.9887552213572615e-07

 

以上是关于基于UCI数据集Condition Based Maintenance of Naval Propulsion Plants的随机森林回归的主要内容,如果未能解决你的问题,请参考以下文章

机器学习之分类问题实战(基于UCI Bank Marketing Dataset)

UCI数据集怎么用?

pandas使用组合条件判断数据列的内容筛选符合条件的数据行(selecting rows based on a condition in dataframe)

基于模型(Model-based)进行特征选择(feature selection)并可视化特征重要性(feature importance)

jcl sort Splitting Files based on a condition

阅读论文《基于神经网络的数据挖掘分类算法比较和分析研究》 安徽大学 工程硕士:常凯 数据集的介绍