R_Studio(关联)对dvdtrans.csv数据进行关联规则分析

Posted 2021-01-17 1138720556gary

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了R_Studio(关联)对dvdtrans.csv数据进行关联规则分析相关的知识，希望对你有一定的参考价值。

　　dvdtrans.csv数据：该原始数据仅仅包含了两个字段(ID, Item) 用户ID，商品名称(共30条)

　　技术分享图片

#导入arules包
#install.packages("arules")
library (arules)

setwd(‘D:\\data‘) 
Gary=read.csv(file="dvdtrans.csv",header=T)

# 将数据转换为arules关联规则方法apriori 可以处理的数据形式.交易数据
# transactions "事务"
Gary<- as(split(Gary$Item, Gary$ID),"transactions")

# 查看一下数据
#attributes(Gary)
summary(Gary)

# 使用apriori函数生成关联规则
rules <- apriori(Gary, parameter=list(support=0.3,confidence=0.5))

# 查看一下数据
inspect(rules)

Gary.R

实现过程

　　导入arules包

　　对数据进行预处理

#导入arules包
#install.packages("arules")
library (arules)

setwd(‘D:\\data‘) 
Gary=read.csv(file="dvdtrans.csv",header=T)

# 将数据转换为arules关联规则方法apriori 可以处理的数据形式.交易数据
# transactions "事务"
Gary<- as(split(Gary$Item, Gary$ID),"transactions")

> # 查看一下数据
> #attributes(Gary)
> summary(Gary)
transactions as itemMatrix in sparse format with
 10 rows (elements/itemsets/transactions) and　　　　　　　　　　　　10行（元素/项集/事务）
 10 columns (items) and a density of 0.3 　　　　　　　　　　　　　　 10列（项）和0.3的密度

most frequent items:　　　　　　　　　　　　　　　　　　　　　　　　　　　最常见的项目(频率)：
    Gladiator       Patriot   Sixth Sense    Green Mile Harry Potter1       (Other) 
            7             6             6             2             2             7 

element (itemset/transaction) length distribution:　　　　　　　　　　元素（项集/事务）长度分布：
sizes
2 3 4 5 
3 5 1 1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00    2.25    3.00    3.00    3.00    5.00 

includes extended item information - examples:
      labels
1 Braveheart
2  Gladiator
3 Green Mile

includes extended transaction information - examples:
  transactionID
1             1
2             2
3             3

　　生成关联规则

> # 使用apriori函数生成关联规则
> rules <- apriori(Gary, parameter=list(support=0.3,confidence=0.5))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target   ext
        0.5    0.1    1 none FALSE            TRUE       5     0.3      1     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 3 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 10 transaction(s)] done [0.00s].
sorting and recoding items ... [3 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [12 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
> 
> # 查看一下数据
> inspect(rules)
     lhs                        rhs           support confidence lift     count
[1]  {}                      => {Patriot}     0.6     0.6000000  1.000000 6    
[2]  {}                      => {Sixth Sense} 0.6     0.6000000  1.000000 6    
[3]  {}                      => {Gladiator}   0.7     0.7000000  1.000000 7    
[4]  {Patriot}               => {Sixth Sense} 0.4     0.6666667  1.111111 4    
[5]  {Sixth Sense}           => {Patriot}     0.4     0.6666667  1.111111 4    
[6]  {Patriot}               => {Gladiator}   0.6     1.0000000  1.428571 6    
[7]  {Gladiator}             => {Patriot}     0.6     0.8571429  1.428571 6    
[8]  {Sixth Sense}           => {Gladiator}   0.5     0.8333333  1.190476 5    
[9]  {Gladiator}             => {Sixth Sense} 0.5     0.7142857  1.190476 5    
[10] {Patriot,Sixth Sense}   => {Gladiator}   0.4     1.0000000  1.428571 4    
[11] {Gladiator,Patriot}     => {Sixth Sense} 0.4     0.6666667  1.111111 4    
[12] {Gladiator,Sixth Sense} => {Patriot}     0.4     0.8000000  1.333333 4

以上是关于R_Studio(关联)对dvdtrans.csv数据进行关联规则分析的主要内容，如果未能解决你的问题，请参考以下文章