如何解决以下错误?输入必须是任意长度的字符向量或字符向量列表,每个字符向量的长度为1

Posted

技术标签:

【中文标题】如何解决以下错误?输入必须是任意长度的字符向量或字符向量列表,每个字符向量的长度为1【英文标题】:How do I solve the following error?Input must be a character vector of any length or a list of character vectors, each of which has a length of 1 【发布时间】:2018-03-02 16:23:12 【问题描述】:

我正在做一个 R 项目。我使用的数据集可在以下链接中找到 https://www.kaggle.com/ranjitha1/hotel-reviews-city-chennai/data

我使用的代码是。

df1 = read.csv("chennai.csv", header = TRUE)
library(tidytext)
tidy_books <- df1 %>% unnest_tokens(word,Review_Text)

这里 Review_Text 是文本列。然而,我收到以下错误。

check_input(x) 中的错误: 输入必须是任意长度的字符向量或字符列表 向量,每个向量的长度为 1。

【问题讨论】:

您的read.csv 语句中需要stringsAsFactors=FALSE。或者使用 read_csv,因为您似乎在 tidyverse 中工作。 我正要说的,但是用一种更紧凑的方式。考虑在处理新数据之前检查新数据的结构,即str(df1),这也会提醒您注意问题 【参考方案1】:

stringsAsFactors 又来了!

您的 Review_Text 列是一个因素,而不是错误消息所述函数需要的字符向量。

我强烈建议使用readr::read_csv 而不是默认的read.csv,因为它更快,并且其默认值不会导致此问题。否则,只需将 stringsAsFactors 设置为 FALSE 即可:

> tidytext::unnest_tokens(readr::read_csv("chennai_reviews.csv"), word, Review_Text)
Parsed with column specification:
cols(
  Hotel_name = col_character(),
  Review_Title = col_character(),
  Review_Text = col_character(),
  Sentiment = col_character(),
  Rating_Percentage = col_character(),
  X6 = col_integer(),
  X7 = col_integer(),
  X8 = col_character(),
  X9 = col_character()
)
Warning: 1 parsing failure.
row # A tibble: 1 x 5 col     row   col   expected                                                                                                       actual expected   <int> <chr>      <chr>                                                                                                        <chr> actual 1  2262    X7 an integer "Expedia Booking  availability was  , only  for  Non-  AC ; ON REQUEST  OVER  PHONE got  it.\n\nRecommended" file # ... with 1 more variables: file <chr>

# A tibble: 179,883 x 9
            Hotel_name                          Review_Title Sentiment Rating_Percentage    X6    X7    X8    X9       word
                 <chr>                                 <chr>     <chr>             <chr> <int> <int> <chr> <chr>      <chr>
 1 Accord Metropolitan Excellent comfortableness during stay         3               100    NA    NA  <NA>  <NA>        its
 2 Accord Metropolitan Excellent comfortableness during stay         3               100    NA    NA  <NA>  <NA>     really
 3 Accord Metropolitan Excellent comfortableness during stay         3               100    NA    NA  <NA>  <NA>       nice
 4 Accord Metropolitan Excellent comfortableness during stay         3               100    NA    NA  <NA>  <NA>      place
 5 Accord Metropolitan Excellent comfortableness during stay         3               100    NA    NA  <NA>  <NA>         to
 6 Accord Metropolitan Excellent comfortableness during stay         3               100    NA    NA  <NA>  <NA>       stay
 7 Accord Metropolitan Excellent comfortableness during stay         3               100    NA    NA  <NA>  <NA> especially
 8 Accord Metropolitan Excellent comfortableness during stay         3               100    NA    NA  <NA>  <NA>        for
 9 Accord Metropolitan Excellent comfortableness during stay         3               100    NA    NA  <NA>  <NA>   business
10 Accord Metropolitan Excellent comfortableness during stay         3               100    NA    NA  <NA>  <NA>        and
# ... with 179,873 more rows
Warning message:
Missing column names filled in: 'X6' [6], 'X7' [7], 'X8' [8], 'X9' [9] 

> tidytext::unnest_tokens(read.csv("chennai_reviews.csv", stringsAsFactors = FALSE), word, Review_Text)
                                                Hotel_name
1                                      Accord Metropolitan
                                                                                                                                                                                                                                                        Review_Title
...snip...

【讨论】:

以上是关于如何解决以下错误?输入必须是任意长度的字符向量或字符向量列表,每个字符向量的长度为1的主要内容,如果未能解决你的问题,请参考以下文章

AES 解密抛出 ValueError:输入字符串的长度必须是 16 的倍数

MATLAB错误:“向量必须是相同的长度”

C++ 轴对齐矩形操作向量和字符串

解决《索引和长度必须引用该字符串内的位置》的错误

使用逻辑 OR 组合列表中的逻辑向量

用javascript编写一个字符串验证函数,要求输入字符长度限制6至20个任意字符且至少包含三个字母