R如何使用case_when()确定列中的先前值是否大于有序向量中的后续值

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R如何使用case_when()确定列中的先前值是否大于有序向量中的后续值相关的知识,希望对你有一定的参考价值。

[我正在为珊瑚人口统计数据集计算生长,需要比较Max Diameter (cm)以确定什么TimeStep的珊瑚萎缩。我尝试使用滞后,但是由于某种原因,我的新列全为NA,而不仅仅是更改为新珊瑚ID的行。是否有人知道我需要做些什么,所以我的Diff列仅包含发生向新菌落过渡的NA?

数据框

A tibble: 20 x 22
   `Taxonomic Code` ID    Date       Year  Site_long Shelter `Module #` Side  Location Settlement_Area TimeStep size_class `Cover Code` `Max Diameter (… `Max Orthogonal…
   <chr>            <fct> <date>     <chr> <fct>     <fct>        <dbl> <chr> <chr>              <dbl>    <dbl>      <dbl>        <dbl>            <dbl>            <dbl>
 1 PR               H30   2018-11-27 18    Hanauma … Low            216 S     D3                 0.759        7          3            2               22               17
 2 PR               H30   2019-02-26 19    Hanauma … Low            216 S     D3                 0.751        8          3            1               24               19
 3 PR               H30   2019-05-28 19    Hanauma … Low            216 S     D3                 0.607        9          3            1               30               20
 4 PR               H30   2019-08-27 19    Hanauma … Low            216 S     D3                 0.615       10          1            1                8                8
 5 PR               H30   2019-11-26 19    Hanauma … Low            216 S     D3                 0.622       11          5            1               46               30
 6 PR               H37   2018-09-09 18    Hanauma … High           215 S     C1                 0.759        6          2            1               14               12
 7 PR               H37   2018-11-27 18    Hanauma … High           215 S     C1                 0.751        7          3            1               22               19
 8 PR               H37   2019-03-12 19    Hanauma … High           215 S     C1                 0.759        8          3            1               26               20
 9 PR               H37   2019-05-21 19    Hanauma … High           215 S     C1                 0.759        9          3            3               29               21
10 PR               H37   2019-09-03 19    Hanauma … High           215 S     C1                 0.683       10          3            1               30               26
11 PR               H66   2018-06-05 18    Hanauma … High           213 N     A1                 0.759        5          2            1               20               19
12 PR               H66   2018-09-09 18    Hanauma … High           213 N     A1                 0.759        6          2            1               20               19
13 PR               H66   2018-12-04 18    Hanauma … High           213 N     A1                 0.653        7          3            1               24               22
14 PR               H66   2019-03-05 19    Hanauma … High           213 N     A1                 0.759        8          3            1               25               24
15 PR               H66   2019-05-28 19    Hanauma … High           213 N     A1                 0.615        9          3            1               28               24
16 PR               H66   2019-09-03 19    Hanauma … High           213 N     A1                 0.531       10          3            1               23               20
17 PR               H66   2019-12-03 19    Hanauma … High           213 N     A1                 0.600       11          3            1               23               16
18 PR               H76   2018-09-09 18    Hanauma … High           213 N     A4                 0.759        6          3            1               21               18
19 PR               H76   2018-12-04 18    Hanauma … High           213 N     A4                 0.653        7          3            1               24               12
20 PR               H76   2019-03-05 19    Hanauma … High           213 N     A4                 0.759        8          3            1               22               19
# … with 7 more variables: `Height (cm)` <dbl>, `Status Code` <chr>, area_mm_squared <dbl>, area_cm_squared <dbl>, Volume_mm_cubed <dbl>, Volume_cm_cubed <dbl>, MD <dbl>

数据框代码

data <- structure(list(`Taxonomic Code` = c("PR", "PR", "PR", "PR", "PR", 
"PR", "PR", "PR", "PR", "PR", "PR", "PR", "PR", "PR", "PR", "PR", 
"PR", "PR", "PR", "PR"), ID = structure(c(35L, 35L, 35L, 35L, 
35L, 38L, 38L, 38L, 38L, 38L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 
61L, 61L, 61L), .Label = c("H1051", "H108", "H110", "H1101", 
"H112", "H113", "H116", "H118", "H1188", "H1211", "H122", "H125", 
"H1253", "H1289", "H171", "H172", "H174", "H186", "H187", "H188", 
"H189", "H191", "H192", "H236", "H237", "H244", "H252", "H254", 
"H258", "H274", "H277", "H288", "H292", "H293", "H30", "H332", 
"H366", "H37", "H374", "H396", "H466", "H479", "H484", "H499", 
"H531", "H560", "H580", "H593", "H597", "H625", "H644", "H647", 
"H649", "H653", "H66", "H693", "H695", "H712", "H728", "H737", 
"H76", "H760", "H774", "H854", "H926", "H96", "H963", "H98", 
"H985", "H991", "H996", "W1038", "W1101", "W1152", "W1154", "W1192", 
"W1208", "W1209", "W1214", "W1227", "W1243", "W1245", "W1315", 
"W1345", "W1361", "W1377", "W1399", "W1438", "W1494", "W1495", 
"W1537", "W1557", "W1614", "W1636", "W1655", "W1669", "W1690", 
"W1697", "W1729", "W1741", "W1758", "W1782", "W1785", "W1847", 
"W1919", "W2000", "W2004", "W2011", "W2036", "W2044", "W2046", 
"W2131", "W2133", "W234", "W249", "W251", "W254", "W307", "W355", 
"W359", "W369", "W433", "W450", "W461", "W470", "W480", "W538", 
"W542", "W544", "W584", "W601", "W606", "W781", "W79", "W807", 
"W872", "W874", "W887", "W890", "W891", "W923", "W952"), class = "factor"), 
    Date = structure(c(17862, 17953, 18044, 18135, 18226, 17783, 
    17862, 17967, 18037, 18142, 17687, 17783, 17869, 17960, 18044, 
    18142, 18233, 17783, 17869, 17960), class = "Date"), Year = c("18", 
    "19", "19", "19", "19", "18", "18", "19", "19", "19", "18", 
    "18", "18", "19", "19", "19", "19", "18", "18", "19"), Site_long = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L), .Label = c("Hanauma Bay", "Waikiki"), class = "factor"), 
    Shelter = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("High", 
    "Low"), class = "factor"), `Module #` = c(216, 216, 216, 
    216, 216, 215, 215, 215, 215, 215, 213, 213, 213, 213, 213, 
    213, 213, 213, 213, 213), Side = c("S", "S", "S", "S", "S", 
    "S", "S", "S", "S", "S", "N", "N", "N", "N", "N", "N", "N", 
    "N", "N", "N"), Location = c("D3", "D3", "D3", "D3", "D3", 
    "C1", "C1", "C1", "C1", "C1", "A1", "A1", "A1", "A1", "A1", 
    "A1", "A1", "A4", "A4", "A4"), Settlement_Area = c(0.75902336, 
    0.751433126, 0.607218688, 0.614808922, 0.622399155, 0.75902336, 
    0.751433126, 0.75902336, 0.75902336, 0.683121024, 0.75902336, 
    0.75902336, 0.65276009, 0.75902336, 0.614808922, 0.531316352, 
    0.599628454, 0.75902336, 0.65276009, 0.75902336), TimeStep = c(7, 
    8, 9, 10, 11, 6, 7, 8, 9, 10, 5, 6, 7, 8, 9, 10, 11, 6, 7, 
    8), size_class = c(3, 3, 3, 1, 5, 2, 3, 3, 3, 3, 2, 2, 3, 
    3, 3, 3, 3, 3, 3, 3), `Cover Code` = c(2, 1, 1, 1, 1, 1, 
    1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), `Max Diameter (cm)` = c(22, 
    24, 30, 8, 46, 14, 22, 26, 29, 30, 20, 20, 24, 25, 28, 23, 
    23, 21, 24, 22), `Max Orthogonal (cm)` = c(17, 19, 20, 8, 
    30, 12, 19, 20, 21, 26, 19, 19, 22, 24, 24, 20, 16, 18, 12, 
    19), `Height (cm)` = c(2, 2, 3, 1, 3, 1, 2, 1, 1, 3, 1, 1, 
    1, 2, 2, 2, 2, 1, 1, 1), `Status Code` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, "B", NA, NA, "PB", NA, NA, 
    NA, NA), area_mm_squared = c(374, 456, 600, 64, 1380, 168, 
    418, 520, 609, 780, 380, 380, 528, 600, 672, 460, 368, 378, 
    288, 418), area_cm_squared = c(3.74, 4.56, 6, 0.64, 13.8, 
    1.68, 4.18, 5.2, 6.09, 7.8, 3.8, 3.8, 5.28, 6, 6.72, 4.6, 
    3.68, 3.78, 2.88, 4.18), Volume_mm_cubed = c(391.651884147528, 
    477.522083345649, 942.477796076938, 33.5103216382911, 2167.69893097696, 
    87.9645943005142, 437.728576400178, 272.271363311115, 318.871654339364, 
    1225.22113490002, 198.967534727354, 198.967534727354, 276.460153515902, 
    628.318530717959, 703.716754404114, 481.710873550435, 385.368698840348, 
    197.920337176157, 150.79644737231, 218.864288200089), Volume_cm_cubed = c(0.391651884147528, 
    0.477522083345649, 0.942477796076938, 0.0335103216382911, 
    2.16769893097696, 0.0879645943005142, 0.437728576400178, 
    0.272271363311115, 0.318871654339364, 1.22522113490002, 0.198967534727354, 
    0.198967534727354, 0.276460153515902, 0.628318530717959, 
    0.703716754404114, 0.481710873550435, 0.385368698840348, 
    0.197920337176157, 0.15079644737231, 0.218864288200089), 
    MD = c(22, 24, 30, 8, 46, 14, 22, 26, 29, 30, 20, 20, 24, 
    25, 28, 23, 23, 21, 24, 22)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

代码

data_new <- data %>% group_by(ID, TimeStep) %>%
  mutate(Diff = `Max Diameter (cm)` - dplyr::lag(`Max Diameter (cm)`))

输出

data_output <- structure(list(`Taxonomic Code` = c("PR", "PR", "PR", "PR", "PR", 
"PR", "PR", "PR", "PR", "PR", "PR", "PR", "PR", "PR", "PR", "PR", 
"PR", "PR", "PR", "PR"), ID = structure(c(35L, 35L, 35L, 35L, 
35L, 38L, 38L, 38L, 38L, 38L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 
61L, 61L, 61L), .Label = c("H1051", "H108", "H110", "H1101", 
"H112", "H113", "H116", "H118", "H1188", "H1211", "H122", "H125", 
"H1253", "H1289", "H171", "H172", "H174", "H186", "H187", "H188", 
"H189", "H191", "H192", "H236", "H237", "H244", "H252", "H254", 
"H258", "H274", "H277", "H288", "H292", "H293", "H30", "H332", 
"H366", "H37", "H374", "H396", "H466", "H479", "H484", "H499", 
"H531", "H560", "H580", "H593", "H597", "H625", "H644", "H647", 
"H649", "H653", "H66", "H693", "H695", "H712", "H728", "H737", 
"H76", "H760", "H774", "H854", "H926", "H96", "H963", "H98", 
"H985", "H991", "H996", "W1038", "W1101", "W1152", "W1154", "W1192", 
"W1208", "W1209", "W1214", "W1227", "W1243", "W1245", "W1315", 
"W1345", "W1361", "W1377", "W1399", "W1438", "W1494", "W1495", 
"W1537", "W1557", "W1614", "W1636", "W1655", "W1669", "W1690", 
"W1697", "W1729", "W1741", "W1758", "W1782", "W1785", "W1847", 
"W1919", "W2000", "W2004", "W2011", "W2036", "W2044", "W2046", 
"W2131", "W2133", "W234", "W249", "W251", "W254", "W307", "W355", 
"W359", "W369", "W433", "W450", "W461", "W470", "W480", "W538", 
"W542", "W544", "W584", "W601", "W606", "W781", "W79", "W807", 
"W872", "W874", "W887", "W890", "W891", "W923", "W952"), class = "factor"), 
    Date = structure(c(17862, 17953, 18044, 18135, 18226, 17783, 
    17862, 17967, 18037, 18142, 17687, 17783, 17869, 17960, 18044, 
    18142, 18233, 17783, 17869, 17960), class = "Date"), Year = c("18", 
    "19", "19", "19", "19", "18", "18", "19", "19", "19", "18", 
    "18", "18", "19", "19", "19", "19", "18", "18", "19"), Site_long = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L), .Label = c("Hanauma Bay", "Waikiki"), class = "factor"), 
    Shelter = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("High", 
    "Low"), class = "factor"), `Module #` = c(216, 216, 216, 
    216, 216, 215, 215, 215, 215, 215, 213, 213, 213, 213, 213, 
    213, 213, 213, 213, 213), Side = c("S", "S", "S", "S", "S", 
    "S", "S", "S", "S", "S", "N", "N", "N", "N", "N", "N", "N", 
    "N", "N", "N"), Location = c("D3", "D3", "D3", "D3", "D3", 
    "C1", "C1", "C1", "C1", "C1", "A1", "A1", "A1", "A1", "A1", 
    "A1", "A1", "A4", "A4", "A4"), Settlement_Area = c(0.75902336, 
    0.751433126, 0.607218688, 0.614808922, 0.622399155, 0.75902336, 
    0.751433126, 0.75902336, 0.75902336, 0.683121024, 0.75902336, 
    0.75902336, 0.65276009, 0.75902336, 0.614808922, 0.531316352, 
    0.599628454, 0.75902336, 0.65276009, 0.75902336), TimeStep = c(7, 
    8, 9, 10, 11, 6, 7, 8, 9, 10, 5, 6, 7, 8, 9, 10, 11, 6, 7, 
    8), size_class = c(3, 3, 3, 1, 5, 2, 3, 3, 3, 3, 2, 2, 3, 
    3, 3, 3, 3, 3, 3, 3), `Cover Code` = c(2, 1, 1, 1, 1, 1, 
    1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), `Max Diameter (cm)` = c(22, 
    24, 30, 8, 46, 14, 22, 26, 29, 30, 20, 20, 24, 25, 28, 23, 
    23, 21, 24, 22), `Max Orthogonal (cm)` = c(17, 19, 20, 8, 
    30, 12, 19, 20, 21, 26, 19, 19, 22, 24, 24, 20, 16, 18, 12, 
    19), `Height (cm)` = c(2, 2, 3, 1, 3, 1, 2, 1, 1, 3, 1, 1, 
    1, 2, 2, 2, 2, 1, 1, 1), `Status Code` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, "B", NA, NA, "PB", NA, NA, 
    NA, NA), area_mm_squared = c(374, 456, 600, 64, 1380, 168, 
    418, 520, 609, 780, 380, 380, 528, 600, 672, 460, 368, 378, 
    288, 418), area_cm_squared = c(3.74, 4.56, 6, 0.64, 13.8, 
    1.68, 4.18, 5.2, 6.09, 7.8, 3.8, 3.8, 5.28, 6, 6.72, 4.6, 
    3.68, 3.78, 2.88, 4.18), Volume_mm_cubed = c(391.651884147528, 
    477.522083345649, 942.477796076938, 33.5103216382911, 2167.69893097696, 
    87.9645943005142, 437.728576400178, 272.271363311115, 318.871654339364, 
    1225.22113490002, 198.967534727354, 198.967534727354, 276.460153515902, 
    628.318530717959, 703.716754404114, 481.710873550435, 385.368698840348, 
    197.920337176157, 150.79644737231, 218.864288200089), Volume_cm_cubed = c(0.391651884147528, 
    0.477522083345649, 0.942477796076938, 0.0335103216382911, 
    2.16769893097696, 0.0879645943005142, 0.437728576400178, 
    0.272271363311115, 0.318871654339364, 1.22522113490002, 0.198967534727354, 
    0.198967534727354, 0.276460153515902, 0.628318530717959, 
    0.703716754404114, 0.481710873550435, 0.385368698840348, 
    0.197920337176157, 0.15079644737231, 0.218864288200089), 
    MD = c(22, 24, 30, 8, 46, 14, 22, 26, 29, 30, 20, 20, 24, 
    25, 28, 23, 23, 21, 24, 22), Diff = c(NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
    )), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -20L), groups = structure(list(ID = structure(c(35L, 
35L, 35L, 35L, 35L, 38L, 38L, 38L, 38L, 38L, 55L, 55L, 55L, 55L, 
55L, 55L, 55L, 61L, 61L, 61L), .Label = c("H1051", "H108", "H110", 
"H1101", "H112", "H113", "H116", "H118", "H1188", "H1211", "H122", 
"H125", "H1253", "H1289", "H171", "H172", "H174", "H186", "H187", 
"H188", "H189", "H191", "H192", "H236", "H237", "H244", "H252", 
"H254", "H258", "H274", "H277", "H288", "H292", "H293", "H30", 
"H332", "H366", "H37", "H374", "H396", "H466", "H479", "H484", 
"H499", "H531", "H560", "H580", "H593", "H597", "H625", "H644", 
"H647", "H649", "H653", "H66", "H693", "H695", "H712", "H728", 
"H737", "H76", "H760", "H774", "H854", "H926", "H96", "H963", 
"H98", "H985", "H991", "H996", "W1038", "W1101", "W1152", "W1154", 
"W1192", "W1208", "W1209", "W1214", "W1227", "W1243", "W1245", 
"W1315", "W1345", "W1361", "W1377", "W1399", "W1438", "W1494", 
"W1495", "W1537", "W1557", "W1614", "W1636", "W1655", "W1669", 
"W1690", "W1697", "W1729", "W1741", "W1758", "W1782", "W1785", 
"W1847", "W1919", "W2000", "W2004", "W2011", "W2036", "W2044", 
"W2046", "W2131", "W2133", "W234", "W249", "W251", "W254", "W307", 
"W355", "W359", "W369", "W433", "W450", "W461", "W470", "W480", 
"W538", "W542", "W544", "W584", "W601", "W606", "W781", "W79", 
"W807", "W872", "W874", "W887", "W890", "W891", "W923", "W952"
), class = "factor"), TimeStep = c(7, 8, 9, 10, 11, 6, 7, 8, 
9, 10, 5, 6, 7, 8, 9, 10, 11, 6, 7, 8), .rows = list(1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
    16L, 17L, 18L, 19L, 20L)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE))
答案

问题在于分组。当我们包含“ TimeStep”时,每个组只有一行,并且单个元素的lagNA

library(dplyr)
data %>%
   group_by(ID %>%
   mutate(Diff = `Max Diameter (cm)` - dplyr::lag(`Max Diameter (cm)`))

以上是关于R如何使用case_when()确定列中的先前值是否大于有序向量中的后续值的主要内容,如果未能解决你的问题,请参考以下文章

如何从 dplyr 中的 case_when 捕获逻辑

R语言case_when函数和cases函数实战

R语言dplyr包使用case_when函数和mutate函数生成新的数据列实战:基于单列生成新的数据列基于多列生成新的数据列

R测试值是不是是组中最低的,如果值是组中最低的,则在新列中添加“是”/“否”

如何从 Pyspark / Python 数据集中先前计算的列中获取值

在python中,我如何对一列中每个值与另一列中的值发生的次数(多少行)建立矩阵?