R语言-Rattle数据挖掘
Posted 医学统计分析学习笔记
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R语言-Rattle数据挖掘相关的知识,希望对你有一定的参考价值。
Rattle包是基于R语言的图形用户界面GUI可视化数据挖掘工具,避免代码书写,功能丰富,还可通过Log查看日志,学习代码,接下来通过10篇文章学习Rattle包在数据挖掘中的应用。
载入和启动Rattle,第一行为菜单栏,第二行工具栏,第三行选项卡,选项卡分别有:Data(输入数据)、Explore(数据探索)、Test(统计检验)、Transform(数据转换)、Cluster(聚类分析)、Associate(关联规则)、Model(模型)、Evaluate(模型评估)。
数据来源选择和输入,格式有TXT、CSV、Excel、SQL等以及R语言包自带的数据集。
Ordered 按相关强弱排序
Explore missing 缺失值相关分析
Hierarchical 层次相关
Method 提供Pearson、Spearman和Kendall三种方法
orrelation summary using the 'Pearson' covariance.
Note that only correlations between numeric variables are reported.
Pressure3pm Pressure9am Humidity9am WindSpeed3pm Humidity3pm Sunshine Cloud3pm WindSpeed9am
Pressure3pm 1.00000000 0.96897642 0.1521277 -0.335226602 -0.01580015 -0.06813014 -0.07554600 -0.29297114
Pressure9am 0.96897642 1.00000000 0.1524648 -0.371372978 -0.09012507 -0.01836370 -0.07673340 -0.40059684
Humidity9am 0.15212768 0.15246484 1.0000000 -0.344325106 0.49552358 -0.46823913 0.27819872 -0.34034116
WindSpeed3pm -0.33522660 -0.37137298 -0.3443251 1.000000000 -0.05553257 0.12503533 -0.07077685 0.49130778
Humidity3pm -0.01580015 -0.09012507 0.4955236 -0.055532570 1.00000000 -0.75751258 0.53217454 0.12861487
Sunshine -0.06813014 -0.01836370 -0.4682391 0.125035331 -0.75751258 1.00000000 -0.67080423 -0.05329541
Cloud3pm -0.07554600 -0.07673340 0.2781987 -0.070776854 0.53217454 -0.67080423 1.00000000 -0.04738166
WindSpeed9am -0.29297114 -0.40059684 -0.3403412 0.491307778 0.12861487 -0.05329541 -0.04738166 1.00000000
Rainfall -0.23702286 -0.31098103 0.1930967 0.045300192 0.31217242 -0.15507662 0.09317110 0.17305783
WindGustSpeed -0.51159855 -0.53119422 -0.4027873 0.657349969 -0.08627584 0.12253949 -0.02771644 0.54032660
Cloud9am -0.12196717 -0.15234691 0.3898705 -0.060765345 0.53367655 -0.66867659 0.51027937 0.10673849
Evaporation -0.42083515 -0.40044671 -0.4513420 0.057763791 -0.37115619 0.27051093 -0.10184521 0.10623752
Temp3pm -0.34293423 -0.24726344 -0.3230771 -0.165244193 -0.56336278 0.45661908 -0.17393142 -0.17167422
MaxTemp -0.37424892 -0.28151629 -0.3272456 -0.152145582 -0.52004926 0.43511216 -0.14324702 -0.15023725
Temp9am -0.50426161 -0.46699389 -0.4187603 0.005536074 -0.23955741 0.20529853 0.01937453 0.13911180
MinTemp -0.52804603 -0.52592375 -0.1978675 -0.047049560 -0.04346591 0.03885863 0.09272276 0.19917684
(左右滑动查看)
Missing values correlation summary using the 'Pearson' covariance.
Note that only correlations between numeric variables are reported.
WindDir9am WindSpeed9am WindGustDir WindGustSpeed Sunshine WindDir3pm
WindDir9am 1.00000000 0.400999442 -0.027879443 -0.027879443 -0.027879443 -0.019675051
WindSpeed9am 0.40099944 1.000000000 -0.011179641 -0.011179641 -0.011179641 -0.007889684
WindGustDir -0.02787944 -0.011179641 1.000000000 1.000000000 -0.007874016 -0.005556842
WindGustSpeed -0.02787944 -0.011179641 1.000000000 1.000000000 -0.007874016 -0.005556842
Sunshine -0.02787944 -0.011179641 -0.007874016 -0.007874016 1.000000000 -0.005556842
WindDir3pm -0.01967505 -0.007889684 -0.005556842 -0.005556842 -0.005556842 1.000000000
Count of missing values:
WindDir9am WindSpeed9am WindGustDir WindGustSpeed Sunshine WindDir3pm
23 4 2 2 2 1
Percent missing values:
WindDir9am WindSpeed9am WindGustDir WindGustSpeed Sunshine WindDir3pm
8.984375 1.562500 0.781250 0.781250 0.781250 0.390625
(左右滑动查看)
PCA分析主要有两种方法SVD和Eigen
Note that principal components on only the numeric
variables is calculated, and so we can not use this
approach to remove categoric variables from consideration.
Any numeric variables with relatively large rotation
values (negative or positive) in any of the first few
components are generally variables that you may wish
to include in the modelling.
Rattle timestamp: 2018-06-07 21:39:35 lenovo
======================================================================
Standard deviations (1, .., p=16):
[1] 2.33967160 1.84564685 1.60151562 1.04888312 0.91136037 0.74777048 0.70891475 0.61316523 0.57853049 0.54903515
[11] 0.51134869 0.41714578 0.25534947 0.16579137 0.12937785 0.07205772
Rotation (n x k) = (16 x 16):
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
MinTemp 0.332147794 -0.19888150 0.24374707 -0.052744720 0.207808523 -0.12719510 0.01875075 0.18136991
MaxTemp 0.374029936 0.08978966 0.25057780 -0.006974787 -0.036994995 0.09067259 0.12047779 0.21047458
Rainfall 0.005796511 -0.25554583 0.01754559 0.585240095 0.544974867 0.48892975 0.09545233 -0.16588758
Evaporation 0.348472780 -0.02048528 0.09026961 -0.161590857 0.059263540 -0.02594145 -0.01248408 -0.68057351
Sunshine 0.218371365 0.36037787 -0.18669096 0.256521413 -0.004101546 0.01273087 0.05448771 0.10851966
WindGustSpeed 0.188224533 -0.20441561 -0.38364279 -0.144862912 -0.093542976 0.19662453 0.32565177 0.18575153
WindSpeed9am 0.086750984 -0.24575847 -0.36845432 -0.166720410 0.461501526 -0.38339748 -0.14748703 0.26435578
WindSpeed3pm 0.075386096 -0.14471982 -0.48804476 -0.137773961 -0.138697649 0.25696715 0.37900970 0.02298519
Humidity9am -0.243316473 -0.13077267 0.27770716 0.362756478 -0.223368987 -0.18594937 0.42312175 0.28779029
Humidity3pm -0.224835150 -0.37901853 0.09367538 0.005465963 0.086236542 -0.04818380 -0.28727836 0.18277319
Pressure9am -0.237936069 0.34351977 0.13456113 -0.307122274 0.306484944 0.16764059 0.22298428 0.11040901
Pressure3pm -0.258224380 0.30461863 0.09483106 -0.298771321 0.425280514 0.12925818 0.18128684 0.07443856
Cloud9am -0.089521297 -0.37057282 0.22700819 -0.207758157 0.071852259 -0.28418390 0.53207698 -0.28589005
Cloud3pm -0.114456565 -0.31894009 0.21867635 -0.346637284 -0.234379494 0.56050810 -0.22324504 0.09488563
Temp9am 0.375914164 -0.09688413 0.19441676 -0.128195493 0.141112838 0.04927045 -0.07243403 0.23761276
Temp3pm 0.369481076 0.11693260 0.25142433 -0.004520988 -0.042580507 0.07041527 0.13823540 0.18508703
PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16
MinTemp 0.08078127 0.18472249 -0.071354795 0.08848297 0.65167895 -0.427155392 -0.185966311 0.004509560
MaxTemp -0.02220307 -0.02077306 0.019804938 -0.08087304 -0.39131782 -0.246433650 0.117999554 -0.696531904
Rainfall -0.07572417 -0.06514432 0.022858510 -0.07646998 -0.04367026 -0.001263031 -0.029692514 0.005543731
Evaporation 0.45036006 -0.05387560 -0.352121100 0.16019128 -0.11265621 0.088853684 -0.018225071 -0.008351970
Sunshine -0.19403416 -0.02822325 -0.047410828 0.81369837 0.01445786 0.058687569 -0.005428731 0.019475811
WindGustSpeed 0.45803701 -0.41063378 0.417925346 0.06313608 0.09226322 0.021222021 0.003948858 0.024704188
WindSpeed9am -0.13608359 -0.31666134 -0.381733019 -0.02487819 -0.22644305 -0.003588899 -0.021421866 -0.005354006
WindSpeed3pm -0.10393634 0.59330840 -0.335728647 -0.07667309 -0.04879505 -0.053601702 0.007195032 -0.006477316
Humidity9am 0.29717693 -0.12886774 -0.467789672 0.03175672 0.03311327 0.206147706 0.005105644 -0.016727725
Humidity3pm 0.39460988 0.42031750 0.196685033 0.37789673 -0.37480362 -0.111727327 0.041353512 0.052443048
Pressure9am 0.11515092 0.02322775 -0.009386425 0.03231424 -0.15601728 0.031952713 -0.699058259 -0.004988305
Pressure3pm 0.15302204 0.03815021 -0.071446771 0.06526793 0.15789523 -0.022776258 0.668281615 0.002740504
Cloud9am -0.41239410 -0.02562913 0.270866448 0.23001946 -0.10303812 0.021283109 0.033417558 0.003660992
Cloud3pm -0.22935868 -0.29575615 -0.295837253 0.24151009 0.05414153 -0.015390752 0.006709164 0.010878829
Temp9am -0.04721686 0.22844725 0.102141691 -0.05602486 0.08440026 0.797188323 0.019769059 -0.003223984
Temp3pm -0.04521188 -0.03175603 -0.025201376 -0.13430980 -0.36563554 -0.222290018 0.105601136 0.714463175
Rattle timestamp: 2018-06-07 21:39:35 lenovo
======================================================================
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12
Standard deviation 2.3397 1.8456 1.6015 1.04888 0.91136 0.74777 0.70891 0.6132 0.57853 0.54904 0.51135 0.41715
Proportion of Variance 0.3421 0.2129 0.1603 0.06876 0.05191 0.03495 0.03141 0.0235 0.02092 0.01884 0.01634 0.01088
Cumulative Proportion 0.3421 0.5550 0.7153 0.78409 0.83600 0.87095 0.90236 0.9259 0.94678 0.96562 0.98196 0.99284
PC13 PC14 PC15 PC16
Standard deviation 0.25535 0.16579 0.12938 0.07206
Proportion of Variance 0.00408 0.00172 0.00105 0.00032
Cumulative Proportion 0.99691 0.99863 0.99968 1.00000
(左右滑动查看)
碎石图
Biplot
Note that principal components on only the numeric
variables is calculated, and so we can not use this
approach to remove categoric variables from consideration.
Any numeric variables with relatively large rotation
values (negative or positive) in any of the first few
components are generally variables that you may wish
to include in the modelling.
Rattle timestamp: 2018-06-07 21:45:16 lenovo
======================================================================
Call:
princomp(x = na.omit(crs$dataset[crs$sample, crs$numeric]), scale = TRUE,
center = TRUE, tol = 0)
Standard deviations:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11
19.9174830 16.3408647 11.6265265 8.8919459 6.1518937 5.7172999 4.9462589 3.5038361 2.9584418 2.0621626 1.6979344
Comp.12 Comp.13 Comp.14 Comp.15 Comp.16
1.5068325 1.4106266 0.9628544 0.8071719 0.4799053
16 variables and 248 observations.
Rattle timestamp: 2018-06-07 21:45:16 lenovo
======================================================================
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
Standard deviation 19.917483 16.3408647 11.6265265 8.89194591 6.15189367 5.71729995 4.94625893 3.50383612
Proportion of Variance 0.393868 0.2651135 0.1342091 0.07850105 0.03757504 0.03245367 0.02429045 0.01218904
Cumulative Proportion 0.393868 0.6589816 0.7931906 0.87169168 0.90926672 0.94172039 0.96601084 0.97819988
Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
Standard deviation 2.958441775 2.062162622 1.697934421 1.506832546 1.410626603 0.9628544018 0.8071719179
Proportion of Variance 0.008689762 0.004222092 0.002862356 0.002254301 0.001975632 0.0009204561 0.0006468654
Cumulative Proportion 0.986889637 0.991111729 0.993974085 0.996228385 0.998204017 0.9991244732 0.9997713386
Comp.16
Standard deviation 0.4799052997
Proportion of Variance 0.0002286614
Cumulative Proportion 1.0000000000
(左右滑动查看)
碎石图
Biplot
执行后,会安装GGobi和GGRaptR,具体操作可查看官网帮助文件。
❖记录学习过程的点滴,每天进步一点点❖
以上是关于R语言-Rattle数据挖掘的主要内容,如果未能解决你的问题,请参考以下文章
R语言图形用户界面数据挖掘包Rattle:加载UCI糖尿病数据集并启动Rattle图形用户界面调用party包中的ctree函数构建条件推理树模型并使用rattle可视化条件推理决策树
R语言图形用户界面数据挖掘包Rattle:加载UCI糖尿病数据集并启动Rattle图形用户界面调用party包中的ctree函数构建条件推理树模型并使用rattle可视化条件推理决策树
R语言加载UCI糖尿病数据集并启动Rattle GUI调用party包中的ctree函数构建条件推理树模型Rattle混淆矩阵使用R自定义编写函数通过混淆矩阵计算特异度敏感度PPVNPV
R语言图形用户界面数据挖掘包Rattle:加载UCI糖尿病数据集并启动Rattle图形用户界面数据集变量重命名,为数据集结果变量添加标签数据划分(训练集测试集验证集)随机数设置