R-Project没有适用于“元”的适用方法应用于“字符”类的对象
Posted
技术标签:
【中文标题】R-Project没有适用于“元”的适用方法应用于“字符”类的对象【英文标题】:R-Project no applicable method for 'meta' applied to an object of class "character" 【发布时间】:2014-09-06 09:55:49 【问题描述】:我正在尝试运行此代码(Ubuntu 12.04、R 3.1.1)
# Load requisite packages
library(tm)
library(ggplot2)
library(lsa)
# Place Enron email snippets into a single vector.
text <- c(
"To Mr. Ken Lay, I’m writing to urge you to donate the millions of dollars you made from selling Enron stock before the company declared bankruptcy.",
"while you netted well over a $100 million, many of Enron's employees were financially devastated when the company declared bankruptcy and their retirement plans were wiped out",
"you sold $101 million worth of Enron stock while aggressively urging the company’s employees to keep buying it",
"This is a reminder of Enron’s Email retention policy. The Email retention policy provides as follows . . .",
"Furthermore, it is against policy to store Email outside of your Outlook Mailbox and/or your Public Folders. Please do not copy Email onto floppy disks, zip disks, CDs or the network.",
"Based on our receipt of various subpoenas, we will be preserving your past and future email. Please be prudent in the circulation of email relating to your work and activities.",
"We have recognized over $550 million of fair value gains on stocks via our swaps with Raptor.",
"The Raptor accounting treatment looks questionable. a. Enron booked a $500 million gain from equity derivatives from a related party.",
"In the third quarter we have a $250 million problem with Raptor 3 if we don’t “enhance” the capital structure of Raptor 3 to commit more ENE shares.")
view <- factor(rep(c("view 1", "view 2", "view 3"), each = 3))
df <- data.frame(text, view, stringsAsFactors = FALSE)
# Prepare mini-Enron corpus
corpus <- Corpus(VectorSource(df$text))
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, function(x) removeWords(x, stopwords("english")))
corpus <- tm_map(corpus, stemDocument, language = "english")
corpus # check corpus
# Mini-Enron corpus with 9 text documents
# Compute a term-document matrix that contains occurrance of terms in each email
# Compute distance between pairs of documents and scale the multidimentional semantic space (MDS) onto two dimensions
td.mat <- as.matrix(TermDocumentMatrix(corpus))
dist.mat <- dist(t(as.matrix(td.mat)))
dist.mat # check distance matrix
# Compute distance between pairs of documents and scale the multidimentional semantic space onto two dimensions
fit <- cmdscale(dist.mat, eig = TRUE, k = 2)
points <- data.frame(x = fit$points[, 1], y = fit$points[, 2])
ggplot(points, aes(x = x, y = y)) + geom_point(data = points, aes(x = x, y = y, color = df$view)) + geom_text(data = points, aes(x = x, y = y - 0.2, label = row.names(df)))
但是,当我运行它时,我得到了这个错误(在td.mat <-
as.matrix(TermDocumentMatrix(corpus))
行中):
Error in UseMethod("meta", x) :
no applicable method for 'meta' applied to an object of class "character"
In addition: Warning message:
In mclapply(unname(content(x)), termFreq, control) :
all scheduled cores encountered errors in user code
我不确定要看什么 - 所有模块都已加载。
【问题讨论】:
我无法复制。您是否可能没有最新版本的软件包(尤其是tm
)?
@DavidRobinson 您在哪个版本的tm
上进行了测试?据我所知,0.6 是最新的。
@MrFlick:我的错误:我昨晚用install.packages
安装它并收到tm_0.5-10
,但我现在意识到这是因为我使用的是R 3.0.1
(升级时间)和最新的tm
需要>=3.1.0
。
【参考方案1】:
最新版本的tm
(0.60) 做到了,因此您不能再使用对简单字符值进行操作的tm_map
函数。所以问题在于您的tolower
步骤,因为这不是“规范”转换(请参阅getTransformations()
)。换成
corpus <- tm_map(corpus, content_transformer(tolower))
content_transformer
函数包装器会将所有内容转换为语料库中的正确数据类型。您可以将content_transformer
与任何旨在操作字符向量的函数一起使用,以便它可以在tm_map
管道中工作。
【讨论】:
谢谢,但是在较新的版本中如何做到这一点? corpus @VladimirStazhilov 该行无需修改即可正常工作。如果您不是这种情况,您应该考虑打开一个带有可重现错误的新问题。 这对我有用,即使我使用在一些处理后生成纯字符串的自定义函数。我只使用texts = tm_map(texts, content_transformer(custom_func))
。【参考方案2】:
这有点旧,但只是为了以后的谷歌搜索:有一个替代解决方案。在corpus <- tm_map(corpus, tolower)
之后,您可以使用corpus <- tm_map(corpus, PlainTextDocument)
将其直接转换为正确的数据类型。
【讨论】:
你是个传奇,先生!!!。我只是通过不再忽略 *** 中的 cmets 节省了一天的工作 :)【参考方案3】:我也遇到了同样的问题,终于找到解决办法了:
似乎语料库对象中的元信息在对其进行转换后被破坏了。
我所做的只是在过程的最后,在它完全准备好之后再次创建语料库。必须克服其他问题,我还编写了一个循环,以便将文本复制回我的数据框:
a<- list()
for (i in seq_along(corpus))
a[i] <- gettext(corpus[[i]][[1]]) #Do not use $content here!
df$text <- unlist(a)
corpus <- Corpus(VectorSource(df$text)) #This action restores the corpus.
【讨论】:
【参考方案4】:文本的操作顺序很重要。您应该在删除标点符号之前删除停用词。
我使用以下内容来准备文本。我的文本包含在 cleanData$LikeMost 中。
有时,根据来源,您首先需要以下内容:
textData$LikeMost <- iconv(textData$LikeMost, to = "utf-8")
一些停用词很重要,因此您可以创建一个修订集。
#create revised stopwords list
newWords <- stopwords("english")
keep <- c("no", "more", "not", "can't", "cannot", "isn't", "aren't", "wasn't",
"weren't", "hasn't", "haven't", "hadn't", "doesn't", "don't", "didn't", "won't")
newWords <- newWords [! newWords %in% keep]
然后,您可以运行 tm 函数:
like <- Corpus(VectorSource(cleanData$LikeMost))
like <- tm_map(like,PlainTextDocument)
like <- tm_map(like, removeWords, newWords)
like <- tm_map(like, removePunctuation)
like <- tm_map(like, removeNumbers)
like <- tm_map(like, stripWhitespace)
【讨论】:
以上是关于R-Project没有适用于“元”的适用方法应用于“字符”类的对象的主要内容,如果未能解决你的问题,请参考以下文章
智能应用横幅;适用于 Android/Google Play 的 Windows 应用商店应用 HTML 元标记?
智能应用横幅;适用于 Android/Google Play 的 Windows 应用商店应用 HTML 元标记?
UseMethod("predict") : 没有适用于“预测”的方法应用于“火车”类的对象