R语言-Rattle数据挖掘

Posted 医学统计分析学习笔记

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R语言-Rattle数据挖掘相关的知识,希望对你有一定的参考价值。

Rattle包是基于R语言的图形用户界面GUI可视化数据挖掘工具,避免代码书写,功能丰富,还可通过Log查看日志,学习代码,接下来通过10篇文章学习Rattle包在数据挖掘中的应用。

1
初识Rattle

载入和启动Rattle,第一行为菜单栏,第二行工具栏,第三行选项卡,选项卡分别有:Data(输入数据)、Explore(数据探索)、Test(统计检验)、Transform(数据转换)、Cluster(聚类分析)、Associate(关联规则)、Model(模型)、Evaluate(模型评估)。

R语言-Rattle数据挖掘(2)

2
Data选项卡

数据来源选择和输入,格式有TXT、CSV、Excel、SQL等以及R语言包自带的数据集。

R语言-Rattle数据挖掘(2)

R语言-Rattle数据挖掘(2)

30
Explore选项卡,数据探索-相关分析

Ordered 按相关强弱排序

Explore missing 缺失值相关分析

Hierarchical 层次相关

Method 提供Pearson、Spearman和Kendall三种方法

R语言-Rattle数据挖掘(2)

相关矩阵
    
      
      
    
  1. orrelation summary using the 'Pearson' covariance.

  2. Note that only correlations between numeric variables are reported.

  3.              Pressure3pm Pressure9am Humidity9am WindSpeed3pm Humidity3pm    Sunshine    Cloud3pm WindSpeed9am

  4. Pressure3pm    1.00000000  0.96897642   0.1521277 -0.335226602 -0.01580015 -0.06813014 -0.07554600  -0.29297114

  5. Pressure9am    0.96897642  1.00000000   0.1524648 -0.371372978 -0.09012507 -0.01836370 -0.07673340  -0.40059684

  6. Humidity9am    0.15212768  0.15246484   1.0000000 -0.344325106  0.49552358 -0.46823913  0.27819872  -0.34034116

  7. WindSpeed3pm  -0.33522660 -0.37137298  -0.3443251  1.000000000 -0.05553257  0.12503533 -0.07077685   0.49130778

  8. Humidity3pm   -0.01580015 -0.09012507   0.4955236 -0.055532570  1.00000000 -0.75751258  0.53217454   0.12861487

  9. Sunshine      -0.06813014 -0.01836370  -0.4682391  0.125035331 -0.75751258  1.00000000 -0.67080423  -0.05329541

  10. Cloud3pm      -0.07554600 -0.07673340   0.2781987 -0.070776854  0.53217454 -0.67080423  1.00000000  -0.04738166

  11. WindSpeed9am  -0.29297114 -0.40059684  -0.3403412  0.491307778  0.12861487 -0.05329541 -0.04738166   1.00000000

  12. Rainfall      -0.23702286 -0.31098103   0.1930967  0.045300192  0.31217242 -0.15507662  0.09317110   0.17305783

  13. WindGustSpeed -0.51159855 -0.53119422  -0.4027873  0.657349969 -0.08627584  0.12253949 -0.02771644   0.54032660

  14. Cloud9am      -0.12196717 -0.15234691   0.3898705 -0.060765345  0.53367655 -0.66867659  0.51027937   0.10673849

  15. Evaporation   -0.42083515 -0.40044671  -0.4513420  0.057763791 -0.37115619  0.27051093 -0.10184521   0.10623752

  16. Temp3pm       -0.34293423 -0.24726344  -0.3230771 -0.165244193 -0.56336278  0.45661908 -0.17393142  -0.17167422

  17. MaxTemp       -0.37424892 -0.28151629  -0.3272456 -0.152145582 -0.52004926  0.43511216 -0.14324702  -0.15023725

  18. Temp9am       -0.50426161 -0.46699389  -0.4187603  0.005536074 -0.23955741  0.20529853  0.01937453   0.13911180

  19. MinTemp       -0.52804603 -0.52592375  -0.1978675 -0.047049560 -0.04346591  0.03885863  0.09272276   0.19917684

(左右滑动查看)

相关图

R语言-Rattle数据挖掘(2)

缺失值相关分析
  
    
    
  
  1. Missing values correlation summary using the 'Pearson' covariance.

  2. Note that only correlations between numeric variables are reported.

  3.               WindDir9am WindSpeed9am  WindGustDir WindGustSpeed     Sunshine   WindDir3pm

  4. WindDir9am     1.00000000  0.400999442 -0.027879443  -0.027879443 -0.027879443 -0.019675051

  5. WindSpeed9am   0.40099944  1.000000000 -0.011179641  -0.011179641 -0.011179641 -0.007889684

  6. WindGustDir   -0.02787944 -0.011179641  1.000000000   1.000000000 -0.007874016 -0.005556842

  7. WindGustSpeed -0.02787944 -0.011179641  1.000000000   1.000000000 -0.007874016 -0.005556842

  8. Sunshine      -0.02787944 -0.011179641 -0.007874016  -0.007874016  1.000000000 -0.005556842

  9. WindDir3pm    -0.01967505 -0.007889684 -0.005556842  -0.005556842 -0.005556842  1.000000000

  10. Count of missing values:

  11.   WindDir9am  WindSpeed9am   WindGustDir WindGustSpeed      Sunshine    WindDir3pm

  12.           23             4             2             2             2             1

  13. Percent missing values:

  14.   WindDir9am  WindSpeed9am   WindGustDir WindGustSpeed      Sunshine    WindDir3pm

  15.     8.984375      1.562500      0.781250      0.781250      0.781250      0.390625

(左右滑动查看)

R语言-Rattle数据挖掘(2)

相关树

R语言-Rattle数据挖掘(2)

3
Explore选项卡,数据探索-主成分分析

PCA分析主要有两种方法SVD和Eigen

R语言-Rattle数据挖掘(2)

SVD
   
     
     
   
  1. Note that principal components on only the numeric

  2. variables is calculated, and so we can not use this

  3. approach to remove categoric variables from  consideration.

  4. Any numeric variables with relatively large rotation

  5. values (negative or positive) in any of the first few

  6. components are generally variables that you may wish

  7. to include in the modelling.

  8. Rattle timestamp: 2018-06-07 21:39:35 lenovo

  9. ======================================================================

  10. Standard deviations (1, .., p=16):

  11. [1] 2.33967160 1.84564685 1.60151562 1.04888312 0.91136037 0.74777048 0.70891475 0.61316523 0.57853049 0.54903515

  12. [11] 0.51134869 0.41714578 0.25534947 0.16579137 0.12937785 0.07205772

  13. Rotation (n x k) = (16 x 16):

  14.                       PC1         PC2         PC3          PC4          PC5         PC6         PC7         PC8

  15. MinTemp        0.332147794 -0.19888150  0.24374707 -0.052744720  0.207808523 -0.12719510  0.01875075  0.18136991

  16. MaxTemp        0.374029936  0.08978966  0.25057780 -0.006974787 -0.036994995  0.09067259  0.12047779  0.21047458

  17. Rainfall       0.005796511 -0.25554583  0.01754559  0.585240095  0.544974867  0.48892975  0.09545233 -0.16588758

  18. Evaporation    0.348472780 -0.02048528  0.09026961 -0.161590857  0.059263540 -0.02594145 -0.01248408 -0.68057351

  19. Sunshine       0.218371365  0.36037787 -0.18669096  0.256521413 -0.004101546  0.01273087  0.05448771  0.10851966

  20. WindGustSpeed  0.188224533 -0.20441561 -0.38364279 -0.144862912 -0.093542976  0.19662453  0.32565177  0.18575153

  21. WindSpeed9am   0.086750984 -0.24575847 -0.36845432 -0.166720410  0.461501526 -0.38339748 -0.14748703  0.26435578

  22. WindSpeed3pm   0.075386096 -0.14471982 -0.48804476 -0.137773961 -0.138697649  0.25696715  0.37900970  0.02298519

  23. Humidity9am   -0.243316473 -0.13077267  0.27770716  0.362756478 -0.223368987 -0.18594937  0.42312175  0.28779029

  24. Humidity3pm   -0.224835150 -0.37901853  0.09367538  0.005465963  0.086236542 -0.04818380 -0.28727836  0.18277319

  25. Pressure9am   -0.237936069  0.34351977  0.13456113 -0.307122274  0.306484944  0.16764059  0.22298428  0.11040901

  26. Pressure3pm   -0.258224380  0.30461863  0.09483106 -0.298771321  0.425280514  0.12925818  0.18128684  0.07443856

  27. Cloud9am      -0.089521297 -0.37057282  0.22700819 -0.207758157  0.071852259 -0.28418390  0.53207698 -0.28589005

  28. Cloud3pm      -0.114456565 -0.31894009  0.21867635 -0.346637284 -0.234379494  0.56050810 -0.22324504  0.09488563

  29. Temp9am        0.375914164 -0.09688413  0.19441676 -0.128195493  0.141112838  0.04927045 -0.07243403  0.23761276

  30. Temp3pm        0.369481076  0.11693260  0.25142433 -0.004520988 -0.042580507  0.07041527  0.13823540  0.18508703

  31.                      PC9        PC10         PC11        PC12        PC13         PC14         PC15         PC16

  32. MinTemp        0.08078127  0.18472249 -0.071354795  0.08848297  0.65167895 -0.427155392 -0.185966311  0.004509560

  33. MaxTemp       -0.02220307 -0.02077306  0.019804938 -0.08087304 -0.39131782 -0.246433650  0.117999554 -0.696531904

  34. Rainfall      -0.07572417 -0.06514432  0.022858510 -0.07646998 -0.04367026 -0.001263031 -0.029692514  0.005543731

  35. Evaporation    0.45036006 -0.05387560 -0.352121100  0.16019128 -0.11265621  0.088853684 -0.018225071 -0.008351970

  36. Sunshine      -0.19403416 -0.02822325 -0.047410828  0.81369837  0.01445786  0.058687569 -0.005428731  0.019475811

  37. WindGustSpeed  0.45803701 -0.41063378  0.417925346  0.06313608  0.09226322  0.021222021  0.003948858  0.024704188

  38. WindSpeed9am  -0.13608359 -0.31666134 -0.381733019 -0.02487819 -0.22644305 -0.003588899 -0.021421866 -0.005354006

  39. WindSpeed3pm  -0.10393634  0.59330840 -0.335728647 -0.07667309 -0.04879505 -0.053601702  0.007195032 -0.006477316

  40. Humidity9am    0.29717693 -0.12886774 -0.467789672  0.03175672  0.03311327  0.206147706  0.005105644 -0.016727725

  41. Humidity3pm    0.39460988  0.42031750  0.196685033  0.37789673 -0.37480362 -0.111727327  0.041353512  0.052443048

  42. Pressure9am    0.11515092  0.02322775 -0.009386425  0.03231424 -0.15601728  0.031952713 -0.699058259 -0.004988305

  43. Pressure3pm    0.15302204  0.03815021 -0.071446771  0.06526793  0.15789523 -0.022776258  0.668281615  0.002740504

  44. Cloud9am      -0.41239410 -0.02562913  0.270866448  0.23001946 -0.10303812  0.021283109  0.033417558  0.003660992

  45. Cloud3pm      -0.22935868 -0.29575615 -0.295837253  0.24151009  0.05414153 -0.015390752  0.006709164  0.010878829

  46. Temp9am       -0.04721686  0.22844725  0.102141691 -0.05602486  0.08440026  0.797188323  0.019769059 -0.003223984

  47. Temp3pm       -0.04521188 -0.03175603 -0.025201376 -0.13430980 -0.36563554 -0.222290018  0.105601136  0.714463175

  48. Rattle timestamp: 2018-06-07 21:39:35 lenovo

  49. ======================================================================

  50. Importance of components:

  51.                          PC1    PC2    PC3     PC4     PC5     PC6     PC7    PC8     PC9    PC10    PC11    PC12

  52. Standard deviation     2.3397 1.8456 1.6015 1.04888 0.91136 0.74777 0.70891 0.6132 0.57853 0.54904 0.51135 0.41715

  53. Proportion of Variance 0.3421 0.2129 0.1603 0.06876 0.05191 0.03495 0.03141 0.0235 0.02092 0.01884 0.01634 0.01088

  54. Cumulative Proportion  0.3421 0.5550 0.7153 0.78409 0.83600 0.87095 0.90236 0.9259 0.94678 0.96562 0.98196 0.99284

  55.                          PC13    PC14    PC15    PC16

  56. Standard deviation     0.25535 0.16579 0.12938 0.07206

  57. Proportion of Variance 0.00408 0.00172 0.00105 0.00032

  58. Cumulative Proportion  0.99691 0.99863 0.99968 1.00000

(左右滑动查看)

R语言-Rattle数据挖掘(2)

碎石图

R语言-Rattle数据挖掘(2)

Biplot

Eigen
 
   
   
 
  1. Note that principal components on only the numeric

  2. variables is calculated, and so we can not use this

  3. approach to remove categoric variables from  consideration.

  4. Any numeric variables with relatively large rotation

  5. values (negative or positive) in any of the first few

  6. components are generally variables that you may wish

  7. to include in the modelling.

  8. Rattle timestamp: 2018-06-07 21:45:16 lenovo

  9. ======================================================================

  10. Call:

  11. princomp(x = na.omit(crs$dataset[crs$sample, crs$numeric]), scale = TRUE,

  12.    center = TRUE, tol = 0)

  13. Standard deviations:

  14.    Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6     Comp.7     Comp.8     Comp.9    Comp.10    Comp.11

  15. 19.9174830 16.3408647 11.6265265  8.8919459  6.1518937  5.7172999  4.9462589  3.5038361  2.9584418  2.0621626  1.6979344

  16.   Comp.12    Comp.13    Comp.14    Comp.15    Comp.16

  17. 1.5068325  1.4106266  0.9628544  0.8071719  0.4799053

  18. 16  variables and  248 observations.

  19. Rattle timestamp: 2018-06-07 21:45:16 lenovo

  20. ======================================================================

  21. Importance of components:

  22.                          Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6     Comp.7     Comp.8

  23. Standard deviation     19.917483 16.3408647 11.6265265 8.89194591 6.15189367 5.71729995 4.94625893 3.50383612

  24. Proportion of Variance  0.393868  0.2651135  0.1342091 0.07850105 0.03757504 0.03245367 0.02429045 0.01218904

  25. Cumulative Proportion   0.393868  0.6589816  0.7931906 0.87169168 0.90926672 0.94172039 0.96601084 0.97819988

  26.                            Comp.9     Comp.10     Comp.11     Comp.12     Comp.13      Comp.14      Comp.15

  27. Standard deviation     2.958441775 2.062162622 1.697934421 1.506832546 1.410626603 0.9628544018 0.8071719179

  28. Proportion of Variance 0.008689762 0.004222092 0.002862356 0.002254301 0.001975632 0.0009204561 0.0006468654

  29. Cumulative Proportion  0.986889637 0.991111729 0.993974085 0.996228385 0.998204017 0.9991244732 0.9997713386

  30.                            Comp.16

  31. Standard deviation     0.4799052997

  32. Proportion of Variance 0.0002286614

  33. Cumulative Proportion  1.0000000000

(左右滑动查看)

R语言-Rattle数据挖掘(2)

碎石图

R语言-Rattle数据挖掘(2)

Biplot

4
Explore选项卡,数据探索-交互式数据探索

执行后,会安装GGobi和GGRaptR,具体操作可查看官网帮助文件。

R语言-Rattle数据挖掘(2)


R语言-Rattle数据挖掘(2)


微信ID: YXSJFX2018

记录学习过程的点滴,每天进步一点点

以上是关于R语言-Rattle数据挖掘的主要内容,如果未能解决你的问题,请参考以下文章

R语言图形用户界面数据挖掘包Rattle:加载UCI糖尿病数据集并启动Rattle图形用户界面调用party包中的ctree函数构建条件推理树模型并使用rattle可视化条件推理决策树

R语言图形用户界面数据挖掘包Rattle:加载UCI糖尿病数据集并启动Rattle图形用户界面调用party包中的ctree函数构建条件推理树模型并使用rattle可视化条件推理决策树

R语言加载UCI糖尿病数据集并启动Rattle GUI调用party包中的ctree函数构建条件推理树模型Rattle混淆矩阵使用R自定义编写函数通过混淆矩阵计算特异度敏感度PPVNPV

R语言图形用户界面数据挖掘包Rattle:加载UCI糖尿病数据集并启动Rattle图形用户界面数据集变量重命名,为数据集结果变量添加标签数据划分(训练集测试集验证集)随机数设置

关于安装R语言的Rattle报错问题的解决方式

R(rattle)实现决策树算法