使用规则在 R 中使用重复项目进行购物篮分析

Posted 2023-03-23

技术标签:

【中文标题】使用规则在 R 中使用重复项目进行购物篮分析【英文标题】：Market basket analysis with duplicated items in R using arules 【发布时间】：2016-05-24 18:15:24 【问题描述】：

我目前正在使用 arules 包执行购物篮分析。我读入的数据如下所示（但行数更多）：

>data
  transaction_id  item
1              1  beer
2              1  beer
3              1  soda
4              2  beer
5              3  beer
6              3  fries
7              3  candy
8              4  soda
9              4  fries

然后我使用 dcast 对其进行转换并删除事务 id 列：

> Trans_Table <- dcast(data, transaction_id ~ item)
> Trans_Table$transaction_id <- NULL

它看起来像这样：

  beer candy fries soda
1    2     0     0    1
2    1     0     0    0
3    1     1     1    0
4    0     0     1    1

但是当我进入“事务”类以便我可以使用 apriori 函数时，它将啤酒下的 2 转换为 1

> Transactions <-  as(as.matrix(Trans_Table), "transactions")
Warning message:
In asMethod(object) :
  matrix contains values other than 0 and 1! Setting all entries != 0 to 1.

有什么方法可以执行市场篮子分析并保持 2？换句话说，我希望看到 beer => beer、beer, beer => soda 和 beer, soda => beer 的规则，但目前只计算一次啤酒每笔交易，即使它被购买了两次。

有人可以帮忙吗？

【问题讨论】：

【参考方案1】：

购物篮分析是查看一起购买的不同商品，而不是特定商品的频率。但是，如果您真的想将同一个项目视为不同的项目，您或许可以使用以下方法来生成新项目名称。

使用库dplyr，您可以改变要附加的项目名称，并附加它发生的时间，并在您的规则处理中使用它：

library(dplyr)
df <- df %>%
        group_by(transaction_id, item) %>%
        mutate(newitem = paste(item, row_number(), sep = ''))
as.matrix(table(df$transaction_id, df$newitem))

输出是：

    beer1 beer2 candy1 fries1 soda1
  1     1     1      0      0     1
  2     1     0      0      0     0
  3     1     0      1      1     0
  4     0     0      0      1     1

也有几种方法可以调整输出以适应特定的格式样式。

【讨论】：

以上是关于使用规则在 R 中使用重复项目进行购物篮分析的主要内容，如果未能解决你的问题，请参考以下文章