如何计算R中数据框中字符串中“c(\”)的出现次数?
Posted
技术标签:
【中文标题】如何计算R中数据框中字符串中“c(\\”)的出现次数?【英文标题】:How to count the occurrences of "c(\" in a string in a data frame in R?如何计算R中数据框中字符串中“c(\”)的出现次数? 【发布时间】:2021-12-29 23:34:31 【问题描述】:我有一个数据框,其中某些列包含来自 Mplus 的错误和警告消息。文本以一种奇怪的格式保存,因此我希望通过计算单元格中 c(\ 的出现次数来简单地计算消息的数量,而不是尝试处理每条消息,因为它是出现在每个消息之前的唯一字符组合警告或错误。
例如,一个单元格包含消息:
[[1]]
[1] "c(\"All variables are uncorrelated with all other variables within class.\""
[2] " \"Check that this is what is intended.\""
[3] " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")"
[4] " c(\"WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE\""
[5] " \"SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE\""
[6] " \"NUMBER OF RANDOM STARTS.\")"
而另一个包含这样的较短消息:
[[1]]
[1] "c(\"All variables are uncorrelated with all other variables within class.\""
[2] " \"Check that this is what is intended.\""
[3] " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")"
我尝试了几种不同的方式使用 str_count,包括我最近的尝试:
str_count(test#, '//c(\//')
但我收到错误:Error: '\/' is an unrecognized escape in character string starting "'//c(\/"
。理想情况下,第一个示例返回 2,第二个示例返回 1。
当这个唯一字符串包含的字符无法封装或转义时,我如何计算它的出现次数?
这里有一些易于使用的测试代码来试一试!
test1 <- '"c(\"All variables are uncorrelated with all other variables within class.\"" " \"Check that this is what is intended.\"" " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")"'
test2 <- '"c(\"All variables are uncorrelated with all other variables within class.\"" " \"Check that this is what is intended.\"" " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")" " c(\"WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE\"" " \"SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE\"" " \"NUMBER OF RANDOM STARTS.\")"'
【问题讨论】:
不是您的问题的解决方案,但您是否考虑过使用lavaan
直接在 R 中进行 SEM?
在我看来,将问题简化为只找到c(
可能更容易,您可以这样做:str_count(test1, "c\\(")
这看起来 data.frame 构造不佳;最好保留原始的“字符向量列表”格式(或者它是否更复杂?)并按照df = data.frame(x = 1:2); df$y = list(c("a", "b"), "d"); lengths(df$y)
的行使用,例如lengths()
。
我们查看了 lavaan,但是关于估计器或整个输入选项的一些事情让我的顾问认为 Mplus 是最好的选择,所以此时我无法控制。 @deschen
@D.J 这实际上可以很好地工作,我想我没有完全理解转义选项是如何完全工作的 - ( 和 \ 都给我带来了很多麻烦。
【参考方案1】:
您可以尝试在我的评论中减少要计算的部分
str_count(test1, "c\\(")
或者您可以通过检查c(\"
来延长参数并使用fixed() 参数:
str_count(test1, fixed('c(\"'))
如您所见,两种方式都显示正确答案:
string1 <- 'c(\"All variables are uncorrelated with all other variables within class.\""
" \"Check that this is what is intended.\""
" \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")"
" c(\"WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED.
THE\"" " \"SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE\""
" \"NUMBER OF RANDOM STARTS.\")'
> str_count(string1, fixed('c(\"'))
[1] 2
> str_count(string1, "c\\(")
[1] 2
【讨论】:
【参考方案2】:你可以试试gregexpr()
。
test1 <- '"c(\" foo bar baz'
test2 <- '"c(\" foo bar baz "c(\" baz bar foo'
length(unlist(gregexpr('c\\(', test1)))
# [1] 1
length(unlist(gregexpr('c\\(', test2)))
# [1] 2
length(unlist(gregexpr('c\\(', list(test1, test2))))
# [1] 3
【讨论】:
以上是关于如何计算R中数据框中字符串中“c(\”)的出现次数?的主要内容,如果未能解决你的问题,请参考以下文章