R_Studio(关联)对dvdtrans.csv数据进行关联规则分析

Posted 1138720556gary

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R_Studio(关联)对dvdtrans.csv数据进行关联规则分析相关的知识,希望对你有一定的参考价值。

 

 


  dvdtrans.csv数据:该原始数据仅仅包含了两个字段(ID, Item) 用户ID,商品名称(共30条)

  技术分享图片

 

  

技术分享图片
#导入arules包
#install.packages("arules")
library (arules)

setwd(D:\\data) 
Gary=read.csv(file="dvdtrans.csv",header=T)

# 将数据转换为arules关联规则方法apriori 可以处理的数据形式.交易数据
# transactions "事务"
Gary<- as(split(Gary$Item, Gary$ID),"transactions")

# 查看一下数据
#attributes(Gary)
summary(Gary)

# 使用apriori函数生成关联规则
rules <- apriori(Gary, parameter=list(support=0.3,confidence=0.5))

# 查看一下数据
inspect(rules)
Gary.R

 

 

实现过程

 

  导入arules包

  对数据进行预处理

#导入arules包
#install.packages("arules")
library (arules)

setwd(D:\\data) 
Gary=read.csv(file="dvdtrans.csv",header=T)

# 将数据转换为arules关联规则方法apriori 可以处理的数据形式.交易数据
# transactions "事务"
Gary<- as(split(Gary$Item, Gary$ID),"transactions")

 

> # 查看一下数据
> #attributes(Gary)
> summary(Gary)
transactions as itemMatrix in sparse format with
 10 rows (elements/itemsets/transactions) and            10行(元素/项集/事务)
 10 columns (items) and a density of 0.3                10列(项)和0.3的密度

most frequent items:                           最常见的项目(频率):
    Gladiator       Patriot   Sixth Sense    Green Mile Harry Potter1       (Other) 
            7             6             6             2             2             7 

element (itemset/transaction) length distribution:          元素(项集/事务)长度分布:
sizes
2 3 4 5 
3 5 1 1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00    2.25    3.00    3.00    3.00    5.00 

includes extended item information - examples:
      labels
1 Braveheart
2  Gladiator
3 Green Mile

includes extended transaction information - examples:
  transactionID
1             1
2             2
3             3

 

  生成关联规则

 

> # 使用apriori函数生成关联规则
> rules <- apriori(Gary, parameter=list(support=0.3,confidence=0.5))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target   ext
        0.5    0.1    1 none FALSE            TRUE       5     0.3      1     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 3 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 10 transaction(s)] done [0.00s].
sorting and recoding items ... [3 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [12 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
> 
> # 查看一下数据
> inspect(rules)
     lhs                        rhs           support confidence lift     count
[1]  {}                      => {Patriot}     0.6     0.6000000  1.000000 6    
[2]  {}                      => {Sixth Sense} 0.6     0.6000000  1.000000 6    
[3]  {}                      => {Gladiator}   0.7     0.7000000  1.000000 7    
[4]  {Patriot}               => {Sixth Sense} 0.4     0.6666667  1.111111 4    
[5]  {Sixth Sense}           => {Patriot}     0.4     0.6666667  1.111111 4    
[6]  {Patriot}               => {Gladiator}   0.6     1.0000000  1.428571 6    
[7]  {Gladiator}             => {Patriot}     0.6     0.8571429  1.428571 6    
[8]  {Sixth Sense}           => {Gladiator}   0.5     0.8333333  1.190476 5    
[9]  {Gladiator}             => {Sixth Sense} 0.5     0.7142857  1.190476 5    
[10] {Patriot,Sixth Sense}   => {Gladiator}   0.4     1.0000000  1.428571 4    
[11] {Gladiator,Patriot}     => {Sixth Sense} 0.4     0.6666667  1.111111 4    
[12] {Gladiator,Sixth Sense} => {Patriot}     0.4     0.8000000  1.333333 4   

 


以上是关于R_Studio(关联)对dvdtrans.csv数据进行关联规则分析的主要内容,如果未能解决你的问题,请参考以下文章

R_Studio(关联)对Groceries数据集进行关联分析

如何在JAVA中打印显示出R语言算法的结果

R_Studio对数值型数据进行统计量分析

R_Studio(cart算法决策树)对book3.csv数据用测试集进行测试并评估模型

R_Studio(学生成绩)对两个班级学生成绩进行集合,重新计算学生综合测评成绩并对学生按综合测评成绩进行排名

R_Studio(学生成绩)使用主成分分析实现属性规约