将 lapply 与 for 和 if..else 语句结合使用,将条件列添加到多个数据帧

Posted

技术标签:

【中文标题】将 lapply 与 for 和 if..else 语句结合使用,将条件列添加到多个数据帧【英文标题】:Using lapply in conjunction with for and if..else statements to add a conditional column to multiple dataframes 【发布时间】:2019-10-10 21:56:53 【问题描述】:

假设我有 2 个数据帧,每个数据帧有两列“pic_type”和“roi”(实际上我有更多数据帧,但 2 个适用于本示例)

a <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))
b <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))

在每个数据帧中,“pic_type”可以是两个字符串值之一(“item”、“relation”),“roi”可以是三个(“object”、“relation”、“pic”)之一。例如(请原谅我糟糕的编码)

a$pic_type <- c("item", "item", "item","relation","relation","relation")
a$roi <- c("object", "object", "pic", "object", "relation","relation")
b$pic_type <- c("item", "item", "item","relation","relation","relation")
b$roi <- c("relation", "relation", "object", "pic", "pic","object")

这给出了:

'a'
 pic_type      roi
 item          object
 item          object
 item          pic
 relation      object
 relation      relation
 relation      relation

'b'
 pic_type      roi
 item          relation
 item          relation
 item          object
 relation      pic
 relation      pic
 relation      object

并将它们放在一个列表中

myList <- list(a,b)

现在我想使用 lapply 遍历列表中的每个 df 并创建一个名为“type”的新列,其中每行包含三个值之一(“occupied”、“empty”或“nil”)。这些值基于以下几点:

If pic_type = "item" & roi = "object", then type = "occupied"
If pic_type = "relation" & roi = "relation", then type = "occupied"
If pic_type = "item" & roi = "relation", then type = "empty"
If pic_type = "relation" & roi = "object", then type = "empty"
Otherwise type = "nil"

例如:

 'a'
 pic_type      roi        type
 item          object     occupied
 item          object     occupied
 item          pic        nil
 relation      object     empty
 relation      relation   occupied
 relation      relation   occupied

我尝试了以下方法:

myList <- lapply(myList, function(x) for(row in 1:dim(x)[1])  
   if(as.data.frame(x)[row,1] == "item" && as.data.frame(x)[row,2]=="object") as.data.frame(x)[row,3] == "occupied"  
   else if(as.data.frame(x)[row,1] == "relation" && as.data.frame(x)[row,2]=="relation") as.data.frame(x)[row,3] == "occupied" 
   else if(as.data.frame(x)[row,1] == "item" && as.data.frame(x)[row,2]=="relation") as.data.frame(x)[row,3] == "empty" 
   else if(as.data.frame(x)[row,1] == "relation" && as.data.frame(x)[row,2]=="object") as.data.frame(x)[row,3] == "empty"
   else as.data.frame(x)[row,3] == "null")

但是这会引发错误:

Error in if (as.data.frame(x)[row, 1] == "item" && as.data.frame(x)[row,  : 
  missing value where TRUE/FALSE needed

谁能提供解决方案?我知道只有两个 dfs 不使用 lapply 会更容易,但我在实际列表中有很多 dfs 并且希望将这个函数应用到它们中的每一个。

提前致谢!

【问题讨论】:

【参考方案1】:

由于您正在迭代的列表项已经是数据帧,我建议跳过第二个逐行循环并直接根据整个列进行分配:

myList <- lapply(myList, function(x) 
    x$type = "nil"
    x$type[x$pic_type== "item" && x$roi=="object" ]  ="occupied"
    x$type[x$pic_type== "relation" && x$roi=="relation" ]  ="occupied"
    x$type[x$pic_type== "item" && x$roi=="relation" ]  ="empty"
    x$type[x$pic_type== "relation" && x$roi=="object" ]  ="empty"
    return(x)
 

还可以使用== 来设置类型,它执行比较,但对于分配,您必须使用单个=

【讨论】:

【参考方案2】:

这通过使用数据框作为映射表而不是您的 if-then 语句来工作

# first lets build your data frames in a list
a <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))
b <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))
a$pic_type <- c("item", "item", "item","relation","relation","relation")
a$roi <- c("object", "object", "pic", "object", "relation","relation")
b$pic_type <- c("item", "item", "item","relation","relation","relation")
b$roi <- c("relation", "relation", "object", "pic", "pic","object")
myList <- list(a,b)

# build the mapping table
mapping = c("item", "object", "occupied",
"relation", "relation", "occupied",
"item", "relation",  "empty",
"relation", "object", "empty")
dim(mapping) =c(3,4)
mapping = as.data.frame(t(mapping))
colnames(mapping)= c("pic_type","roi","type")

addTheColumnType 函数将数据框的行与映射表匹配,并返回带有附加列“类型”的数据框:

addTheColumnType = function (df, mapping)
  # build keys for columns of interest
  mappingKey = apply(mapping[,c("pic_type","roi")],1,paste, collapse="-")
  aKey  = apply(df,1,paste, collapse="-")
  # match the keys and pick the type
  df$type = mapping$type [match(aKey, mappingKey)]
  # replace NAs by nil (for unmatched rows)
  df$type[is.na(df$type)] = "nil"
  return (df)

最后,将此函数应用于您的数据框列表

lapply(myList, addTheColumnType, mapping=mapping)

【讨论】:

【参考方案3】:

欢迎使用 ***。

R 的工作方式与其他软件包略有不同,注意有两个“if/else”命令很有用。有关说明,请参阅else if() VS ifelse()。像 R 中的许多命令一样,ifelse 是矢量化的,这意味着它将接受一个矢量并输出一个矢量 - 即。无需明确告诉它在数据框中逐行运行。

对于您的示例,您希望使用ifelse(),或者甚至更好的是来自dplyr 库(来自tidyverse 集合https://www.tidyverse.org/)的case_when 命令,它允许测试多个条件(请参阅@987654323 @ 用于对选项的一般讨论)。下面我也使用了basewithin 命令,但同样可以使用dplyr 库中的mutate 命令。

library(dplyr)

a <- data.frame(
  pic_type = c("item", "item", "item","relation","relation","relation"),
  roi = c("object", "object", "pic", "object", "relation","relation")
)

b <- data.frame(
  pic_type = c("item", "item", "item","relation","relation","relation"),
  roi = c("relation", "relation", "object", "pic", "pic","object")
)

myList <- list(a = a, b = b)

myList <- lapply(myList, function(x) 

    x <- within(x, 
      type = case_when(
        (pic_type == "item" & roi == "object") |
          (pic_type == "relation" & roi == "relation") ~ "occupied",
        (pic_type == "item" & roi == "relation") | 
          (pic_type =="relation" & roi == "object") ~ "empty",
        TRUE ~ "nil")        
    )

  return(x)

)

myList$a

【讨论】:

以上是关于将 lapply 与 for 和 if..else 语句结合使用,将条件列添加到多个数据帧的主要内容,如果未能解决你的问题,请参考以下文章

vue v-for 和 v-if v-else一起使用造成的bug

使用lapply或for循环将多个csv文件拉入自己的R数据帧

如何使用python将and else语句添加到for循环中的if作为lambda的一部分

-密文,if else判断和while,for循环

for循环中的if else语句[java]

python的for else语句