将 lapply 与 for 和 if..else 语句结合使用,将条件列添加到多个数据帧
Posted
技术标签:
【中文标题】将 lapply 与 for 和 if..else 语句结合使用,将条件列添加到多个数据帧【英文标题】:Using lapply in conjunction with for and if..else statements to add a conditional column to multiple dataframes 【发布时间】:2019-10-10 21:56:53 【问题描述】:假设我有 2 个数据帧,每个数据帧有两列“pic_type”和“roi”(实际上我有更多数据帧,但 2 个适用于本示例)
a <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))
b <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))
在每个数据帧中,“pic_type”可以是两个字符串值之一(“item”、“relation”),“roi”可以是三个(“object”、“relation”、“pic”)之一。例如(请原谅我糟糕的编码)
a$pic_type <- c("item", "item", "item","relation","relation","relation")
a$roi <- c("object", "object", "pic", "object", "relation","relation")
b$pic_type <- c("item", "item", "item","relation","relation","relation")
b$roi <- c("relation", "relation", "object", "pic", "pic","object")
这给出了:
'a'
pic_type roi
item object
item object
item pic
relation object
relation relation
relation relation
'b'
pic_type roi
item relation
item relation
item object
relation pic
relation pic
relation object
并将它们放在一个列表中
myList <- list(a,b)
现在我想使用 lapply 遍历列表中的每个 df 并创建一个名为“type”的新列,其中每行包含三个值之一(“occupied”、“empty”或“nil”)。这些值基于以下几点:
If pic_type = "item" & roi = "object", then type = "occupied"
If pic_type = "relation" & roi = "relation", then type = "occupied"
If pic_type = "item" & roi = "relation", then type = "empty"
If pic_type = "relation" & roi = "object", then type = "empty"
Otherwise type = "nil"
例如:
'a'
pic_type roi type
item object occupied
item object occupied
item pic nil
relation object empty
relation relation occupied
relation relation occupied
我尝试了以下方法:
myList <- lapply(myList, function(x) for(row in 1:dim(x)[1])
if(as.data.frame(x)[row,1] == "item" && as.data.frame(x)[row,2]=="object") as.data.frame(x)[row,3] == "occupied"
else if(as.data.frame(x)[row,1] == "relation" && as.data.frame(x)[row,2]=="relation") as.data.frame(x)[row,3] == "occupied"
else if(as.data.frame(x)[row,1] == "item" && as.data.frame(x)[row,2]=="relation") as.data.frame(x)[row,3] == "empty"
else if(as.data.frame(x)[row,1] == "relation" && as.data.frame(x)[row,2]=="object") as.data.frame(x)[row,3] == "empty"
else as.data.frame(x)[row,3] == "null")
但是这会引发错误:
Error in if (as.data.frame(x)[row, 1] == "item" && as.data.frame(x)[row, :
missing value where TRUE/FALSE needed
谁能提供解决方案?我知道只有两个 dfs 不使用 lapply 会更容易,但我在实际列表中有很多 dfs 并且希望将这个函数应用到它们中的每一个。
提前致谢!
【问题讨论】:
【参考方案1】:由于您正在迭代的列表项已经是数据帧,我建议跳过第二个逐行循环并直接根据整个列进行分配:
myList <- lapply(myList, function(x)
x$type = "nil"
x$type[x$pic_type== "item" && x$roi=="object" ] ="occupied"
x$type[x$pic_type== "relation" && x$roi=="relation" ] ="occupied"
x$type[x$pic_type== "item" && x$roi=="relation" ] ="empty"
x$type[x$pic_type== "relation" && x$roi=="object" ] ="empty"
return(x)
还可以使用==
来设置类型,它执行比较,但对于分配,您必须使用单个=
。
【讨论】:
【参考方案2】:这通过使用数据框作为映射表而不是您的 if-then 语句来工作
# first lets build your data frames in a list
a <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))
b <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))
a$pic_type <- c("item", "item", "item","relation","relation","relation")
a$roi <- c("object", "object", "pic", "object", "relation","relation")
b$pic_type <- c("item", "item", "item","relation","relation","relation")
b$roi <- c("relation", "relation", "object", "pic", "pic","object")
myList <- list(a,b)
# build the mapping table
mapping = c("item", "object", "occupied",
"relation", "relation", "occupied",
"item", "relation", "empty",
"relation", "object", "empty")
dim(mapping) =c(3,4)
mapping = as.data.frame(t(mapping))
colnames(mapping)= c("pic_type","roi","type")
addTheColumnType 函数将数据框的行与映射表匹配,并返回带有附加列“类型”的数据框:
addTheColumnType = function (df, mapping)
# build keys for columns of interest
mappingKey = apply(mapping[,c("pic_type","roi")],1,paste, collapse="-")
aKey = apply(df,1,paste, collapse="-")
# match the keys and pick the type
df$type = mapping$type [match(aKey, mappingKey)]
# replace NAs by nil (for unmatched rows)
df$type[is.na(df$type)] = "nil"
return (df)
最后,将此函数应用于您的数据框列表
lapply(myList, addTheColumnType, mapping=mapping)
【讨论】:
【参考方案3】:欢迎使用 ***。
R 的工作方式与其他软件包略有不同,注意有两个“if/else”命令很有用。有关说明,请参阅else if() VS ifelse()。像 R 中的许多命令一样,ifelse
是矢量化的,这意味着它将接受一个矢量并输出一个矢量 - 即。无需明确告诉它在数据框中逐行运行。
对于您的示例,您希望使用ifelse()
,或者甚至更好的是来自dplyr
库(来自tidyverse
集合https://www.tidyverse.org/)的case_when
命令,它允许测试多个条件(请参阅@987654323 @ 用于对选项的一般讨论)。下面我也使用了base
within
命令,但同样可以使用dplyr
库中的mutate
命令。
library(dplyr)
a <- data.frame(
pic_type = c("item", "item", "item","relation","relation","relation"),
roi = c("object", "object", "pic", "object", "relation","relation")
)
b <- data.frame(
pic_type = c("item", "item", "item","relation","relation","relation"),
roi = c("relation", "relation", "object", "pic", "pic","object")
)
myList <- list(a = a, b = b)
myList <- lapply(myList, function(x)
x <- within(x,
type = case_when(
(pic_type == "item" & roi == "object") |
(pic_type == "relation" & roi == "relation") ~ "occupied",
(pic_type == "item" & roi == "relation") |
(pic_type =="relation" & roi == "object") ~ "empty",
TRUE ~ "nil")
)
return(x)
)
myList$a
【讨论】:
以上是关于将 lapply 与 for 和 if..else 语句结合使用,将条件列添加到多个数据帧的主要内容,如果未能解决你的问题,请参考以下文章
vue v-for 和 v-if v-else一起使用造成的bug
使用lapply或for循环将多个csv文件拉入自己的R数据帧