文本分类,rnn包R

Posted

技术标签:

【中文标题】文本分类,rnn包R【英文标题】:text classification, rnn package R 【发布时间】:2021-07-20 04:36:55 【问题描述】:

我正在尝试使用 R 的 rnn 包进行一些分类。我的输入是文本,类是两个,比如说“1”或“2”。我阅读了包的文档,能够执行示例,文本似乎存在某种问题。我将每个文本字符串转换为二进制并存储在矩阵中。数据和代码示例:

text.variable.preference = c("i like orange", "i like apple", "i prefer melon", "i prefer deserts to fruits")

text.variable.not.preference = c("i don't like fruits", "i don't like vegetables", "i like pop music", "i don't like anything")

matrix.preference = matrix(nrow = 0, ncol = 8)
for (i in 1:NROW(text.variable.preference)
matrix.1 = int2bin(utf8toint(textvariable[i]))
matrix.preference = rbind(matrix.1, matrix.preference)




matrix.not.preference = matrix(nrow = 0, ncol = 8)

for (i in 1:NROW(text.variable.not.preference)
matrix.1 = int2bin(utf8toint(textvariable.not.preference[i]))
matrix.preference = rbind(matrix.1, matrix.preference.not.preference)


X = array(c(matrix.preference, matrix.not.preference), dim=c(dim(matrix.preference),2))
y = int2bin(rep(2:1,c(4,4)))

What I want is to train my rnn model in a way that the output of each text string would be either 1 or 2.

something like:

model.rnn = train(Y=y, X=X, network_type="rnn", learningrate=0.1, hidden_dimension = 10)
But there is the problem that dim(y) is not equal to dim(X). Pretty logical as the binary of string is much bigger than the binary of the "1" or "2".

我想知道是否有一种聪明的方法可以实现这一目标。

【问题讨论】:

【参考方案1】:

当您将文本转换为 int 时,您会为每个字符获得一个输入行。您的标签应与此相符。

X = rbind(matrix.preference, matrix.not.preference)
y = int2bin(rep(1:2, times = c(nrow(matrix.preference), nrow(matrix.not.preference))))
model.rnn = trainr(Y = y, X = X, learningrate = 1, numepochs = 10, hidden_dim = 10)

【讨论】:

以上是关于文本分类,rnn包R的主要内容,如果未能解决你的问题,请参考以下文章

文本分类:Keras+RNN vs传统机器学习

怎么把这个RNN文本分类代码改成文本生成?

文本分类 RNN - LSTM - 错误检查目标

用于文本分类的RNN-Attention网络

如何使用 keras RNN 在数据集中进行文本分类?

基于cnn和rnn的文本分类实践