嵌套 ifelse 语句

Posted 2023-02-24

技术标签:

【中文标题】嵌套 ifelse 语句【英文标题】：Nested ifelse statement 【发布时间】：2013-08-03 10:45:40 【问题描述】：

我仍在学习如何将 SAS 代码翻译成 R，但我收到了警告。我需要了解我在哪里犯了错误。我想要做的是创建一个变量来总结和区分人口的 3 种状态：大陆、海外、外国人。我有一个包含 2 个变量的数据库：

身份证国籍：idnat（法国，外国人），

如果idnat 是法语，那么：

id出生地：idbp（大陆、殖民地、海外）

我想将idnat 和idbp 的信息汇总到一个名为idnat2 的新变量中：

状态：k（大陆、海外、外国人）

所有这些变量都使用“字符类型”。

idnat2 列中的预期结果：

   idnat     idbp   idnat2
1  french mainland mainland
2  french   colony overseas
3  french overseas overseas
4 foreign  foreign  foreign

这是我想用 R 翻译的 SAS 代码：

if idnat = "french" then do;
   if idbp in ("overseas","colony") then idnat2 = "overseas";
   else idnat2 = "mainland";
end;
else idnat2 = "foreigner";
run;

这是我在 R 中的尝试：

if(idnat=="french")
    idnat2 <- "mainland"
 else if(idbp=="overseas"|idbp=="colony")
    idnat2 <- "overseas"
 else 
    idnat2 <- "foreigner"

我收到此警告：

Warning message:
In if (idnat=="french")  :
  the condition has length > 1 and only the first element will be used

我被建议使用“嵌套的ifelse”来代替它，但会收到更多警告：

idnat2 <- ifelse (idnat=="french", "mainland",
        ifelse (idbp=="overseas"|idbp=="colony", "overseas")
      )
            else (idnat2 <- "foreigner")

根据警告消息，长度大于 1，因此只考虑第一个括号之间的内容。对不起，但我不明白这个长度与这里有什么关系？有人知道我哪里错了吗？

【问题讨论】：

你不应该混合ifelse和else。 @Roland 你说得对，谢谢你的建议，我刚刚把结果。我想要的只是在 idnat2 列中，如果它说清楚的话。 @KarlForner 谢谢你，这正是我试图用简单的例子做的，但我真的在“R”上苦苦挣扎。我试过在 SPSS 上做同样的事情，而且更简单。我的意思是 SO 不能替代学习一门语言。有很多书籍，教程......当你遇到困难时，你应该在这里发布，并且你已经使用了所有其他资源。最好的。 @KarlForner 我完全同意你的看法。但是，在这种特定情况下（if 与 ifelse）我赞成这个问题，因为我在开始使用 R 时遇到了完全相同的问题。从Introduction to R 不清楚，R Language Definition 中的ifelse 没有任何内容，在 R For Dummies 中有几个例子。描述if 和ifelse 之间差异的任何其他来源？ 【参考方案1】：

将 SQL CASE 语句与 dplyr 和 sqldf 包一起使用：

数据

df <-structure(list(idnat = structure(c(2L, 2L, 2L, 1L), .Label = c("foreign", 
"french"), class = "factor"), idbp = structure(c(3L, 1L, 4L, 
2L), .Label = c("colony", "foreign", "mainland", "overseas"), class = "factor")), .Names = c("idnat", 
"idbp"), class = "data.frame", row.names = c(NA, -4L))

sqldf

library(sqldf)
sqldf("SELECT idnat, idbp,
        CASE 
          WHEN idbp IN ('colony', 'overseas') THEN 'overseas' 
          ELSE idbp 
        END AS idnat2
       FROM df")

dplyr

library(dplyr)
df %>% 
mutate(idnat2 = case_when(idbp == 'mainland' ~ "mainland", 
                          idbp %in% c("colony", "overseas") ~ "overseas", 
                         TRUE ~ "foreign"))

输出

    idnat     idbp   idnat2
1  french mainland mainland
2  french   colony overseas
3  french overseas overseas
4 foreign  foreign  foreign

【讨论】：

【参考方案2】：

示例的解释是帮助我的关键，但我遇到的问题是当我复制它时它不起作用，所以我不得不以多种方式弄乱它以使其正常工作。（我是 R 的超级新手，由于缺乏知识，我对第三个 ifelse 有一些问题）。

所以对于那些遇到问题的超级 R 新手...

   ifelse(x < -2,"pretty negative", ifelse(x < 1,"close to zero", ifelse(x < 3,"in [1, 3)","large")##all one line
     )#normal tab
)

（我在一个函数中使用了它，所以它“ifelse...”被标记在一个上面，但最后一个“)”完全在左边）

【讨论】：

仅供参考——在进行数字分类时，cut 会更好。您可以将其重写为cut(x, breaks = c(-Inf, -2, 1, 3, Inf), labels = c("pretty negative", "close to zero", "in [1, 3)", "large"))。如果只是一两个嵌套ifelse 也一样好，但如果你必须更深入cut 可以很好地缓解不需要跟踪所有嵌套和括号。谢谢，我没用过cut，好像它把东西分解成(-inf,-2],(-2,1],(1,3],(3, inf]，所以只要间隔状态为“x cut 也有一个参数 right（默认为 TRUE），表示间隔在右侧闭合。设置right = FALSE 会给你[-inf,-2),[-2,1),[1,3),[3,inf)。这里不涉及-Inf 和Inf 边界，但您也可以使用include.lowest 切换两个极端是否关闭。有关详细信息，请参阅?cut。【参考方案3】：

很抱歉加入派对太晚了。这是一个简单的解决方案。

#building up your initial table
idnat <- c(1,1,1,2) #1 is french, 2 is foreign

idbp <- c(1,2,3,4) #1 is mainland, 2 is colony, 3 is overseas, 4 is foreign

t <- cbind(idnat, idbp)

#the last column will be a vector of row length = row length of your matrix
idnat2 <- vector()

#.. and we will populate that vector with a cursor

for(i in 1:length(idnat))

     #*check that we selected the cursor to for the length of one of the vectors*

  

  if (t[i,1] == 2) #*this says: if idnat = foreign, then it's foreign*

    

      idnat2[i] <- 3 #3 is foreign

    

  else if (t[i,2] == 1) #*this says: if not foreign and idbp = mainland then it's mainland*

    

      idnat2[i] <- 2 # 2 is mainland  

    

  else #*this says: anything else will be classified as colony or overseas*

    

      idnat2[i] <- 1 # 1 is colony or overseas 

    




cbind(t,idnat2)

【讨论】：

直截了当，是的。但也是冗长和非惯用的......并且没有很好地说明（为什么使用这些整数而不是问题中提供的数据？）并且重复 Azul 的答案，它使用基本相同的方法，但来自问题的文本数据而不是整数... Porque se me ronco hacerlo de esa manera, Gregor。看到了吗？我们可以用多少美丽的方式交流……阿祖尔的……豪尔赫的……格雷戈尔的…… 由 OP 来选择对他来说更合乎逻辑的东西......对你来说......对我来说也是如此。 Saludos Gregor.【参考方案4】：

# Read in the data.

idnat=c("french","french","french","foreign")
idbp=c("mainland","colony","overseas","foreign")

# Initialize the new variable.

idnat2=as.character(vector())

# Logically evaluate "idnat" and "idbp" for each case, assigning the appropriate level to "idnat2".

for(i in 1:length(idnat)) 
  if(idnat[i] == "french" & idbp[i] == "mainland") 
    idnat2[i] = "mainland"
 else if (idnat[i] == "french" & (idbp[i] == "colony" | idbp[i] == "overseas")) 
  idnat2[i] = "overseas"
 else 
  idnat2[i] = "foreign"
 


# Create a data frame with the two old variables and the new variable.

data.frame(idnat,idbp,idnat2)

【讨论】：

【参考方案5】：

如果数据集包含许多行，则使用data.table 而不是嵌套的ifelse() 与查找表连接可能更有效。

提供了下面的查找表

lookup

     idnat     idbp   idnat2
1:  french mainland mainland
2:  french   colony overseas
3:  french overseas overseas
4: foreign  foreign  foreign

和一个样本数据集

library(data.table)
n_row <- 10L
set.seed(1L)
DT <- data.table(idnat = "french",
                 idbp = sample(c("mainland", "colony", "overseas", "foreign"), n_row, replace = TRUE))
DT[idbp == "foreign", idnat := "foreign"][]

      idnat     idbp
 1:  french   colony
 2:  french   colony
 3:  french overseas
 4: foreign  foreign
 5:  french mainland
 6: foreign  foreign
 7: foreign  foreign
 8:  french overseas
 9:  french overseas
10:  french mainland

然后我们可以在加入时进行更新：

DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][]

      idnat     idbp   idnat2
 1:  french   colony overseas
 2:  french   colony overseas
 3:  french overseas overseas
 4: foreign  foreign  foreign
 5:  french mainland mainland
 6: foreign  foreign  foreign
 7: foreign  foreign  foreign
 8:  french overseas overseas
 9:  french overseas overseas
10:  french mainland mainland

【讨论】：

【参考方案6】：

如果您使用任何电子表格应用程序，则有一个基本函数if()，其语法为：

if(<condition>, <yes>, <no>)

ifelse() 在 R 中的语法完全相同：

ifelse(<condition>, <yes>, <no>)

在电子表格应用程序中与if() 的唯一区别是R ifelse() 是矢量化的（将矢量作为输入并在输出时返回矢量）。考虑以下电子表格应用程序和 R 中的公式比较示例，我们希望比较 a > b，如果是则返回 1，否则返回 0。

在电子表格中：

  A  B C
1 3  1 =if(A1 > B1, 1, 0)
2 2  2 =if(A2 > B2, 1, 0)
3 1  3 =if(A3 > B3, 1, 0)

在 R 中：

> a <- 3:1; b <- 1:3
> ifelse(a > b, 1, 0)
[1] 1 0 0

ifelse()可以有多种嵌套方式：

ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>))

ifelse(<condition>, ifelse(<condition>, <yes>, <no>), <no>)

ifelse(<condition>, 
       ifelse(<condition>, <yes>, <no>), 
       ifelse(<condition>, <yes>, <no>)
      )

ifelse(<condition>, <yes>, 
       ifelse(<condition>, <yes>, 
              ifelse(<condition>, <yes>, <no>)
             )
       )

要计算列idnat2，您可以：

df <- read.table(header=TRUE, text="
idnat idbp idnat2
french mainland mainland
french colony overseas
french overseas overseas
foreign foreign foreign"
)

with(df, 
     ifelse(idnat=="french",
       ifelse(idbp %in% c("overseas","colony"),"overseas","mainland"),"foreign")
     )

R Documentation

the condition has length > 1 and only the first element will be used 是什么？让我们看看：

> # What is first condition really testing?
> with(df, idnat=="french")
[1]  TRUE  TRUE  TRUE FALSE
> # This is result of vectorized function - equality of all elements in idnat and 
> # string "french" is tested.
> # Vector of logical values is returned (has the same length as idnat)
> df$idnat2 <- with(df,
+   if(idnat=="french")
+   idnat2 <- "xxx"
+   
+   )
Warning message:
In if (idnat == "french")  :
  the condition has length > 1 and only the first element will be used
> # Note that the first element of comparison is TRUE and that's whay we get:
> df
    idnat     idbp idnat2
1  french mainland    xxx
2  french   colony    xxx
3  french overseas    xxx
4 foreign  foreign    xxx
> # There is really logic in it, you have to get used to it

我还能使用if()吗？是的，你可以，但语法不是很酷:)

test <- function(x) 
  if(x=="french") 
    "french"
   else
    "not really french"
  


apply(array(df[["idnat"]]),MARGIN=1, FUN=test)

如果你熟悉SQL，也可以在sqldfpackage中使用CASEstatement。

【讨论】：

这个解释真好，终于让我理解了嵌套ifelse()的方法。谢谢！我在任何地方看到的嵌套 ifelse 的最佳解释。【参考方案7】：

使用data.table，解决方案是：

DT[, idnat2 := ifelse(idbp %in% "foreign", "foreign", 
        ifelse(idbp %in% c("colony", "overseas"), "overseas", "mainland" ))]

ifelse 是矢量化的。 if-else 不是。在这里，DT 是：

    idnat     idbp
1  french mainland
2  french   colony
3  french overseas
4 foreign  foreign

这给出了：

   idnat     idbp   idnat2
1:  french mainland mainland
2:  french   colony overseas
3:  french overseas overseas
4: foreign  foreign  foreign

【讨论】：

imo 更好的方法是：DT[, idnat2 := idbp][idbp %in% c('colony','overseas'), idnat2 := 'overseas'] 甚至更好：DT[, idnat2 := idbp][idbp == 'colony', idnat2 := 'overseas'] 另一种data.table 方法是加入查找表：DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][]【参考方案8】：

您可以在没有if 和ifelse 的情况下创建向量idnat2。

函数replace可用于将所有出现的"colony"替换为"overseas"：

idnat2 <- replace(idbp, idbp == "colony", "overseas")

【讨论】：

差不多：df$idnat2 <- df$idbp; df$idnat2[df$idnat == 'colony'] <- 'overseas'【参考方案9】：

尝试以下方法：

# some sample data
idnat <- sample(c("french","foreigner"),100,TRUE)
idbp <- rep(NA,100)
idbp[idnat=="french"] <- sample(c("mainland","overseas","colony"),sum(idnat=="french"),TRUE)

# recoding
out <- ifelse(idnat=="french" & !idbp %in% c("overseas","colony"), "mainland",
              ifelse(idbp %in% c("overseas","colony"),"overseas",
                     "foreigner"))
cbind(idnat,idbp,out) # check result

您的困惑来自 SAS 和 R 如何处理 if-else 结构。在 R 中，if 和 else 未矢量化，这意味着它们检查单个条件是否为真（即，if("french"=="french") 有效）并且无法处理多个逻辑（即，if(c("french","foreigner")=="french") 不起作用）并且 R 给出你收到的警告。

相比之下，ifelse 是矢量化的，因此它可以获取您的矢量（也称为输入变量）并测试其每个元素的逻辑条件，就像您在 SAS 中习惯的那样。解决这个问题的另一种方法是使用if 和else 语句构建一个循环（正如您在这里开始所做的那样），但矢量化ifelse 方法将更有效并且通常涉及更少的代码.

【讨论】：

您好，好的，R 中的 IF 和 ELSE 未矢量化，这就是为什么我收到有关长度 > 1 的警告，并且只记录了第一个 TRUE 参数。我将尝试您对 IFELSE 的提示，尽管 Tomas greif 也是如此，但它似乎更有效。

以上是关于嵌套 ifelse 语句的主要内容，如果未能解决你的问题，请参考以下文章