如何使用条件[重复]检查多个值

Posted 2021-04-25

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了如何使用条件[重复]检查多个值相关的知识，希望对你有一定的参考价值。

这个问题在这里已有答案：

Idiom for ifelse-style recoding for multiple categories 12个答案

我喜欢下面提到的数据框：

记录：

ID        Remarks         Value
1         ABC             10
1         AAB             12
1         ZZX             15
2         XYZ             12
2         ABB             14

通过利用上面提到的数据帧，我想在现有的数据帧中添加新的列Status。

如果Remarks是ABC，AAB或ABB，那么状态将是TRUE，对于XYZ和ZZX，它应该是FALSE。

我正在使用下面提到的方法，但它没有用。

Records$Status<-ifelse(Records$Remarks %in% ("ABC","AAB","ABB"),"TRUE",
                             ifelse(Records$Remarks %in% 
                      ("XYZ","ZZX"),"FALSE"))

并且，基于Status我希望得到以下输出：

ID     TRUE    FALSE    Sum
1       2       1        37
2       1       1        26

答案

Records$Status<-ifelse(Records$Remarks %in% c("ABC","AAB","ABB"),TRUE,
                        ifelse(Records$Remarks %in% 
                                   c("XYZ","ZZX"),FALSE, NA))

你需要用c()包含你的字符串列表，并为第二个ifelse添加一个“else”条件（但是请参阅下面的Roman答案，以便更好地使用case_when进行此操作）。（另请注意，在这里我将"TRUE"和"FALSE"（作为字符类）更改为TRUE和FALSE（逻辑类）。

对于摘要（使用dplyr）：

Records %>% group_by(ID) %>% 
dplyr::summarise(trues=sum(Status), falses=sum(!Status), sum=sum(Value))

# A tibble: 2 x 4
     ID trues falses   sum
  <int> <int>  <int> <int>
1     1     2      1    37
2     2     1      1    26

当然，如果您不是真的需要中间状态列但只想要摘要表，则可以完全跳过第一步：

Records %>% group_by(ID) %>% 
dplyr::summarise(trues=sum(Remarks %in% c("ABC","AAB","ABB")), 
  falses=sum(Remarks %in% c("XYZ","ZZX")), 
  sum=sum(Value))

另一答案

因为在你的第二个问题中使用dplyr是有意义的（参见@ iod的答案），这也是第一部分使用软件包非常简单的case_when()函数的好机会。

Records %>% 
    mutate(Status = case_when(Remarks %in% c("ABC", "AAB", "ABB") ~ TRUE,
                              Remarks %in% c("XYZ", "ZZX") ~ FALSE,
                              TRUE ~ NA))

  ID Remarks Value Status
1  1     ABC    10   TRUE
2  1     AAB    12   TRUE
3  1     ZZX    15  FALSE
4  2     XYZ    12  FALSE
5  2     ABB    14   TRUE

另一答案

这种方法将扩展到大量的评论。

Load the data and prepare a matching data frame

第二个数据帧在备注及其TRUE或FALSE值之间进行匹配。

library(readr)
library(dplyr)
library(tidyr)
dtf <- read_table("id        remarks         value
1         ABC             10
1         AAB             12
1         ZZX             15
2         XYZ             12
2         ABB             14")
truefalse <- data_frame(remarks = c("ABC", "AAB", "ABB", "ZZX", "XYZ"),
                        tf = c(TRUE, TRUE, TRUE, FALSE, FALSE))

Group by id and summarise

这是问题中提出的格式

dtf %>% 
    left_join(truefalse, by = "remarks") %>% 
    group_by(id) %>% 
    summarise(true = sum(tf),
              false = sum(!tf),
              value = sum(value)) 

# A tibble: 2 x 4
     id  true false value
  <int> <int> <int> <int>
1     1     2     1    37
2     2     1     1    26

Alternative proposal: group by id, tf and summarise

该选项保留了value沿分组变量id和tf的传播的更多细节。

    dtf %>% 
        left_join(truefalse, by = "remarks") %>% 
        group_by(id, tf) %>% 
        summarise(n = n(),
                  value = sum(value))
# A tibble: 4 x 4
# Groups:   id [?]
     id tf        n value
  <int> <lgl> <int> <int>
1     1 FALSE     1    15
2     1 TRUE      2    22
3     2 FALSE     1    12
4     2 TRUE      1    14

另一答案

在大多数情况下，没有ifelse，生活更容易，线条更短：

# short version
df$Status <- df$Remarks %in% c("ABC","AAB","ABB")

此版本适用于大多数用途，但它有缺点。 Status将是FALSE如果Remarks是NA或者说"garbage"但是人们可能希望它在这些情况下是NA而FALSE只有Remarks %in% c("XYZ", "ZZX")。因此，可以添加和乘以条件，最后将其转换为logical：

df$Status <- as.logical(with(df,
                  Remarks %in% c("ABC","AAB","ABB")  +
                  ! Remarks %in% c("XYZ","ZZX") ))

和基数R的汇总表：

aggregate(df[,-(1:2)], df["ID"], function(x) if(is.numeric(x)) sum(x) else table(x))

嗯...也许一些格式化会很有用：

t1 <- aggregate(df[,-(1:2)], df["ID"], function(x) if(is.numeric(x)) sum(x) else table(x))
t1 <- t1[, c(1,3,2)]
colnames(t1) <- c("ID", "", "Sum")
t1
#   ID FALSE TRUE Sum
# 1  1     1    2  37
# 2  2     1    1  26

另一答案

只有当有两个提到的组（"ABC", "AAB", "ABB" vs "XYZ","ZZX", ...）时，才能返回正确的结果。对我来说@ iod的解决方案，更像R，但我试图避免使用ifelse，并以另一种方式做到：

Code:

library(tidyverse)

dt %>%
  group_by(ID, Status = grepl("^A[AB][CB]$", Remarks)) %>%
  summarise(N = n(), Sum = sum(Value)) %>%
  spread(Status, N) %>%
  summarize_all(sum, na.rm = T) %>%                       # data still groupped by ID
  select("ID", "TRUE", "FALSE", "Sum")

# A tibble: 2 x 4
     ID `TRUE` `FALSE`   Sum
  <int>  <int>   <int> <int>
1     1      2       1    37
2     2      1       1    26

Data:

dt <- structure(
  list(ID = c(1L, 1L, 1L, 2L, 2L), 
       Remarks = c("ABC", "AAB", "ZZX", "XYZ", "ABB"),
       Value = c(10L, 12L, 15L, 12L, 14L)), 
  .Names = c("ID", "Remarks", "Value"), class = "data.frame", row.names = c(NA, -5L)
  )

以上是关于如何使用条件[重复]检查多个值的主要内容，如果未能解决你的问题，请参考以下文章