R_Studio(关联)对dvdtrans.csv数据进行关联规则分析
Posted 1138720556gary
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R_Studio(关联)对dvdtrans.csv数据进行关联规则分析相关的知识,希望对你有一定的参考价值。
dvdtrans.csv数据:该原始数据仅仅包含了两个字段(ID, Item) 用户ID,商品名称(共30条)
#导入arules包 #install.packages("arules") library (arules) setwd(‘D:\\data‘) Gary=read.csv(file="dvdtrans.csv",header=T) # 将数据转换为arules关联规则方法apriori 可以处理的数据形式.交易数据 # transactions "事务" Gary<- as(split(Gary$Item, Gary$ID),"transactions") # 查看一下数据 #attributes(Gary) summary(Gary) # 使用apriori函数生成关联规则 rules <- apriori(Gary, parameter=list(support=0.3,confidence=0.5)) # 查看一下数据 inspect(rules)
实现过程
导入arules包
对数据进行预处理
#导入arules包 #install.packages("arules") library (arules) setwd(‘D:\\data‘) Gary=read.csv(file="dvdtrans.csv",header=T) # 将数据转换为arules关联规则方法apriori 可以处理的数据形式.交易数据 # transactions "事务" Gary<- as(split(Gary$Item, Gary$ID),"transactions")
> # 查看一下数据 > #attributes(Gary) > summary(Gary) transactions as itemMatrix in sparse format with 10 rows (elements/itemsets/transactions) and 10行(元素/项集/事务) 10 columns (items) and a density of 0.3 10列(项)和0.3的密度 most frequent items: 最常见的项目(频率): Gladiator Patriot Sixth Sense Green Mile Harry Potter1 (Other) 7 6 6 2 2 7 element (itemset/transaction) length distribution: 元素(项集/事务)长度分布: sizes 2 3 4 5 3 5 1 1 Min. 1st Qu. Median Mean 3rd Qu. Max. 2.00 2.25 3.00 3.00 3.00 5.00 includes extended item information - examples: labels 1 Braveheart 2 Gladiator 3 Green Mile includes extended transaction information - examples: transactionID 1 1 2 2 3 3
生成关联规则
> # 使用apriori函数生成关联规则 > rules <- apriori(Gary, parameter=list(support=0.3,confidence=0.5)) Apriori Parameter specification: confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext 0.5 0.1 1 none FALSE TRUE 5 0.3 1 10 rules FALSE Algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE Absolute minimum support count: 3 set item appearances ...[0 item(s)] done [0.00s]. set transactions ...[10 item(s), 10 transaction(s)] done [0.00s]. sorting and recoding items ... [3 item(s)] done [0.00s]. creating transaction tree ... done [0.00s]. checking subsets of size 1 2 3 done [0.00s]. writing ... [12 rule(s)] done [0.00s]. creating S4 object ... done [0.00s]. > > # 查看一下数据 > inspect(rules) lhs rhs support confidence lift count [1] {} => {Patriot} 0.6 0.6000000 1.000000 6 [2] {} => {Sixth Sense} 0.6 0.6000000 1.000000 6 [3] {} => {Gladiator} 0.7 0.7000000 1.000000 7 [4] {Patriot} => {Sixth Sense} 0.4 0.6666667 1.111111 4 [5] {Sixth Sense} => {Patriot} 0.4 0.6666667 1.111111 4 [6] {Patriot} => {Gladiator} 0.6 1.0000000 1.428571 6 [7] {Gladiator} => {Patriot} 0.6 0.8571429 1.428571 6 [8] {Sixth Sense} => {Gladiator} 0.5 0.8333333 1.190476 5 [9] {Gladiator} => {Sixth Sense} 0.5 0.7142857 1.190476 5 [10] {Patriot,Sixth Sense} => {Gladiator} 0.4 1.0000000 1.428571 4 [11] {Gladiator,Patriot} => {Sixth Sense} 0.4 0.6666667 1.111111 4 [12] {Gladiator,Sixth Sense} => {Patriot} 0.4 0.8000000 1.333333 4
以上是关于R_Studio(关联)对dvdtrans.csv数据进行关联规则分析的主要内容,如果未能解决你的问题,请参考以下文章
R_Studio(关联)对Groceries数据集进行关联分析
R_Studio(cart算法决策树)对book3.csv数据用测试集进行测试并评估模型