如何使用R中的关键字将字符的数据框分类?
Posted
技术标签:
【中文标题】如何使用R中的关键字将字符的数据框分类?【英文标题】:How to classify Data frame of characters into categories using keywords in R? 【发布时间】:2019-01-18 15:17:51 【问题描述】:我正在尝试将客户评论的数据框分类到相应的类别中。例如,
x <- data.frame(Reviews = c("The phone performance and display is good","Worth the money","Camera is good"))
想要的输出如下图
我尝试使用 R 的 Quanteda 包创建如下字典
dic <- dictionary(list(camera = c("camera","lens","pixel", "pictures",
"pixels", "snap"), display = c("resolution", "display", "depth", "mode",
"color", "colour", "discolour"), performance = c("performance", "speed",
"usage", "fast", "run", "running", "lag", "processor", "shut", "shut down",
"restart", "hanging","hang"), Value = c("money", "worth", "budget", "value",
"price", "specs", "specifications", "invest",
"under","expectations","expected","expecting","expect")))
我想如上所述根据关键字对文本进行分类。请帮忙
P.S : dfm 是一种选择。但特别是,我想知道如何根据所需的输出对文本数据框进行分类。
【问题讨论】:
【参考方案1】:已经使用了大部分代码:
# Creating a DFM and saving the Reviews in a Vector
require("quanteda")
x <- dfm( Reviews <- c(
"The phone performance and display is good",
"Worth the money",
"Camera is good"),
tolower = TRUE)
我将大写字母转换为小写字母,否则固定比较将不起作用。此外,我建议删除停用词和某种蒸汽。
# Creating the dictionary
dic <- dictionary(list(camera = c("camera","lens","pixel", "pictures", "pixels", "snap"),
display = c("resolution", "display", "depth", "mode", "color", "colour", "discolour"),
performance = c("performance", "speed", "usage", "fast", "run", "running", "lag", "processor", "shut", "shut down", "restart", "hanging","hang"),
Value = c("money", "worth", "budget", "value", "price", "specs", "specifications", "invest", "under","expectations","expected","expecting","expect")))
使用dfm_lookup
函数:
# fixed parameter fof exact matching
res <- dfm_lookup(x, dic, valuetype = "fixed")
row.names(res)<- Reviews
res
希望这就是你要找的:)
【讨论】:
以上是关于如何使用R中的关键字将字符的数据框分类?的主要内容,如果未能解决你的问题,请参考以下文章
使用write.xlsx将数据框写入R中的excel时如何以粗体打印顶行