如何使用R中的关键字将字符的数据框分类?

Posted

技术标签:

【中文标题】如何使用R中的关键字将字符的数据框分类?【英文标题】:How to classify Data frame of characters into categories using keywords in R? 【发布时间】:2019-01-18 15:17:51 【问题描述】:

我正在尝试将客户评论的数据框分类到相应的类别中。例如,

x <- data.frame(Reviews = c("The phone performance and display is good","Worth the money","Camera is good"))

想要的输出如下图

我尝试使用 R 的 Quanteda 包创建如下字典

dic <- dictionary(list(camera = c("camera","lens","pixel", "pictures", 
"pixels", "snap"), display = c("resolution", "display", "depth", "mode", 
"color", "colour", "discolour"), performance = c("performance", "speed", 
"usage", "fast", "run", "running", "lag", "processor", "shut", "shut down", 
"restart", "hanging","hang"), Value = c("money", "worth", "budget", "value", 
"price", "specs", "specifications", "invest", 
"under","expectations","expected","expecting","expect")))

我想如上所述根据关键字对文本进行分类。请帮忙

P.S : dfm 是一种选择。但特别是,我想知道如何根据所需的输出对文本数据框进行分类。

【问题讨论】:

【参考方案1】:

已经使用了大部分代码:

# Creating a DFM and saving the Reviews in a Vector
require("quanteda")
x <- dfm( Reviews <- c(
        "The phone performance and display is good",
        "Worth the money",
        "Camera is good"),
          tolower = TRUE)

我将大写字母转换为小写字母,否则固定比较将不起作用。此外,我建议删除停用词和某种蒸汽。

# Creating the dictionary
dic <- dictionary(list(camera = c("camera","lens","pixel", "pictures", "pixels", "snap"), 
                       display = c("resolution", "display", "depth", "mode", "color", "colour", "discolour"), 
                       performance = c("performance", "speed", "usage", "fast", "run", "running", "lag", "processor", "shut", "shut down", "restart", "hanging","hang"), 
                       Value = c("money", "worth", "budget", "value", "price", "specs", "specifications", "invest", "under","expectations","expected","expecting","expect")))

使用dfm_lookup函数:

# fixed parameter fof exact matching
res <- dfm_lookup(x, dic, valuetype = "fixed")
row.names(res)<- Reviews
res

希望这就是你要找的:)

【讨论】:

以上是关于如何使用R中的关键字将字符的数据框分类?的主要内容,如果未能解决你的问题,请参考以下文章

如何从字符向量中解析 CSV 数据以提取数据框?

R中等效的案例语句

如何将字符串附加到R中的变量名子集?

使用write.xlsx将数据框写入R中的excel时如何以粗体打印顶行

如何使用字符串列表在 Python 3 中搜索 pandas 数据框

将多个分类变量转换为R中的因子