根据R中不同状态的条件创建列状态
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了根据R中不同状态的条件创建列状态相关的知识,希望对你有一定的参考价值。
我有一个这样的数据框:
ID <- c(1,2,3,4,5,5,5,6,6)
States <- c(NA,NA,"All Locked","All Not Locked","All Locked","All Locked"
,"All Not Locked","All Not Locked","All Not Locked")
ToolID <- c(NA,NA,"SWP","SWP","SWP","SWP","SWP","SWP","SWP")
Measurement <- c("Length","Breadth","Width","Height","Time","Time"
,"Time","Mass","Mass")
Location <- c("US","US","UK","UK","US","US","US","UK","UK")
df1 <- data.frame(ID,States,ToolID,Measurement,Location)
我正在尝试使用以下条件对此数据框执行一些数据操作
For each ID (grouped),
if States = NA, then the Status = "No Status"
if States column contains at least(count >=) 1 "All Locked", then the Status = "Lock Status"
if States column doesn't contain (count =0) "All Locked", then the Status = "No Lock Status"
我想要的输出是
ID ToolID Measurement Location Status
1 NA Length US No Status
2 NA Breadth US No status
3 SWP Width UK Lock Status
4 SWP Height UK No Lock Status
5 SWP Time US Lock Status
6 SWP Mass UK No Lock Status
我试图这样做,但逻辑错误
df1$Status <- ifelse(df1$States == NA, "No Status",
ifelse((count(df1$States == "All Locked") >=1),
"Lock Status",
ifelse((count(df1$States == "All Locked") <1),
"No Lock Status", NA)))
有人能指出我正确的方向吗?我想申请我更大的数据集,因此快速解决方案对我有很大帮助。
答案
对于NA
元素,使用is.na
和dplyr::count
在data.frame/tbl
s上工作。
在这里,我们按'ID'分组,检查if
在'States'列中至少有一个"All Locked"
然后将其更改为整个组的“All Locked”(而不是使用mutate
执行此操作,在group_by
和add=TRUE
中更改它为了添加一个新的分组变量和现有的组),按“ID”和“状态”的频率获取组,然后根据条件,更改“状态”中的值
library(dplyr)
df1 %>%
group_by(ID) %>%
group_by(States = if("All Locked" %in% States) "All Locked"
else States, add = TRUE) %>%
mutate(n = n()) %>%
ungroup %>%
mutate(States = c("No Lock Status", "Lock Status")[1+
(States == "All Locked" & n >=1)],
States = replace(States, is.na(States), "No Status")) %>%
select(-n) %>%
distinct
另一答案
这是一个使用dplyr::case_when
的简短干净的习语。首先,我们计算Status
作为“全部锁定”(0..1或NA)的状态的汇总统计比例,然后我们立即将Status
列回收到相应的字符串输出中:
df1 %>% group_by(ID) %>%
summarize(ToolID=ToolID[1], Measurement=Measurement[1], Location=Location[1],
Status = sum( States=="All Locked")/n() ) %>%
mutate(Status = case_when(
is.na(Status) ~ "No Status",
Status == 1 ~ "Lock Status",
Status == 0 ~ "No Lock Status",
between(Status, 0, 1) ~ as.character(NA) ))
输出:
ID ToolID Measurement Location Status
<dbl> <fctr> <fctr> <fctr> <chr>
1 1.00 NA Length US No Status
2 2.00 NA Breadth US No Status
3 3.00 SWP Width UK Lock Status
4 4.00 SWP Height UK No Lock Status
5 5.00 SWP Time US NA
6 6.00 SWP Mass UK No Lock Status
另一答案
any()
函数非常适合聚合,这里。加入查找表会将NA
,TRUE
和FALSE
转换为OP期望的Status
值。
该方法可以用data.table
语法以及dplyr
样式实现。
Create lookup table
这将由data.table
和dplyr
变体使用。
library(data.table)
lut <- data.table(st = c(NA, TRUE, FALSE),
Status = c("No Status", "Lock Status", "No Lock Status"))
data.table
version
library(data.table)
# aggregate by ID
agg <- setDT(df1)[, .(st = any(States == "All Locked")), by = ID][
# join with lookup table
lut, on = "st"][, -"st"]
# join with df1 to prepend other columns
unique(df1[, -"States"])[agg, on = "ID"]
ID ToolID Measurement Location Status 1: 1 <NA> Length US No Status 2: 2 <NA> Breadth US No Status 3: 3 SWP Width UK Lock Status 4: 5 SWP Time US Lock Status 5: 4 SWP Height UK No Lock Status 6: 6 SWP Mass UK No Lock Status
dplyr
version
library(dplyr)
agg <-df1 %>%
group_by(ID) %>%
summarize(st = any(States == "All Locked")) %>%
left_join(lut) %>%
select(-st)
df1 %>%
select(-States) %>%
unique() %>%
left_join(agg)
ID ToolID Measurement Location Status 1 1 <NA> Length US No Status 2 2 <NA> Breadth US No Status 3 3 SWP Width UK Lock Status 4 4 SWP Height UK No Lock Status 5 5 SWP Time US Lock Status 6 6 SWP Mass UK No Lock Status
以上是关于根据R中不同状态的条件创建列状态的主要内容,如果未能解决你的问题,请参考以下文章