如何在 R 的 lapply() 中引用正在操作的行

Posted

技术标签:

【中文标题】如何在 R 的 lapply() 中引用正在操作的行【英文标题】:How to Reference the Row Being Operated on in lapply() in R 【发布时间】:2018-02-08 06:29:26 【问题描述】:

我需要帮助来参考我在 R 中的用户定义函数中使用 lapply() 处理的观察结果。我想查看数据框的每个观察值在同一数据框的类似观察值的子集中的位置.我无法引用原始观察结果来提取其排名。

这是我的数据示例:

> dput(df)
structure(list(MP = c(29L, 32L, 3L, 34L, 14L, 3L, 40L, 17L, 13L, 
14L, 4L, 36L, 6L, 33L, 25L, 12L, 17L, 3L, 15L, 28L, 33L, 39L, 
30L), Player.ID = structure(c(1L, 2L, 3L, 8L, 14L, 16L, 21L, 
26L, 30L, 34L, 35L, 42L, 41L, 43L, 46L, 58L, 62L, 79L, 86L, 100L, 
102L, 106L, 107L), .Label = c("abrinal01", "adamsst01", "aldrico01", 
"aldrila01", "anderky01", "anderry01", "antetgi01", "anthoca01", 
"anunoog01", "arthuda01", "bartowi01", "bealbr01", "bertada01", 
"bjeline01", "brogdma01", "***aa01", "***di01", "brownlo01", 
"brownst02", "bullore01", "butleji01", "buyckdw01", "capelca01", 
"chandwi01", "craigto01", "crawfja01", "davisde01", "dellama01", 
"derozde01", "dienggo01", "drumman01", "ennisja01", "farieke01", 
"feltora01", "fergute01", "forbebr01", "fraziti01", "gallola01", 
"gasolma01", "gasolpa01", "georgma01", "georgpa01", "gibsota01", 
"ginobma01", "gortama01", "grantje01", "greenda02", "greenge01", 
"greenja01", "griffbl01", "hardeja01", "harrian01", "harriga01", 
"henrymy01", "hensojo01", "hilarne01", "hillida01", "huestjo01", 
"ibakase01", "johnsst04", "jokicni01", "jonesty01", "kennalu01", 
"kilpase01", "lauvejo01", "lowryky01", "lylestr01", "mahinia01", 
"makerth01", "martija01", "mbahalu01", "mclembe01", "meeksjo01", 
"middlkh01", "millspa02", "moreler01", "morrima02", "mudiaem01", 
"muhamsh01", "munfoxa02", "murrade01", "murraja01", "noguelu01", 
"oubreke01", "parketo01", "pattepa01", "paulbr01", "paulch01", 
"plumlma02", "poeltja01", "porteot01", "powelno01", "reedwi02", 
"satorto01", "scottmi01", "seldewa01", "siakapa01", "smithis01", 
"snellto01", "teaguje01", "tollian01", "townska01", "tuckepj01", 
"valanjo01", "vaughra01", "westbru01", "wiggian01", "wilsodj01", 
"wrighde01"), class = "factor"), Game.ID = structure(c(7L, 7L, 
6L, 7L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 6L, 6L, 7L, 7L, 6L, 6L, 
7L, 6L, 6L, 7L, 6L), .Label = c("2018-02-01 * DEN", "2018-02-01 * DET", 
"2018-02-01 * HOU", "2018-02-01 * MEM", "2018-02-01 * MIL", "2018-02-01 * MIN", 
"2018-02-01 * OKC", "2018-02-01 * SAS", "2018-02-01 * TOR", "2018-02-01 * WAS"
), class = "factor")), .Names = c("MP", "Player.ID", "Game.ID"
), row.names = c(1L, 2L, 3L, 8L, 14L, 16L, 21L, 26L, 30L, 34L, 
35L, 41L, 42L, 43L, 46L, 58L, 62L, 79L, 86L, 100L, 102L, 106L, 
107L), class = "data.frame")

我定义了以下函数:

> f1 <- function(col1, df, col2)
+   lapply(col1, function(i)
+     df2 <- df[col1 == i, col2]
+     df3 <- data.frame(cbind(sort(df2, decreasing = TRUE), rownames(data.table(sort(df2, decreasing = TRUE)))))
+     df3
+   )
+   
> f1(df$Game.ID, df, c('MP'))[1:10]
[[1]]
   X1 X2
1  39  1
2  36  2
3  34  3
4  32  4
5  29  5
6  25  6
7  15  7
8  14  8
9  12  9
10  4 10

[[2]]
   X1 X2
1  39  1
2  36  2
3  34  3
4  32  4
5  29  5
6  25  6
7  15  7
8  14  8
9  12  9
10  4 10

[[3]]
   X1 X2
1  40  1
2  33  2
3  33  3
4  30  4
5  28  5
6  17  6
7  17  7
8  14  8
9  13  9
10  6 10
11  3 11
12  3 12
13  3 13

[[4]]
   X1 X2
1  39  1
2  36  2
3  34  3
4  32  4
5  29  5
6  25  6
7  15  7
8  14  8
9  12  9
10  4 10

[[5]]
   X1 X2
1  40  1
2  33  2
3  33  3
4  30  4
5  28  5
6  17  6
7  17  7
8  14  8
9  13  9
10  6 10
11  3 11
12  3 12
13  3 13

[[6]]
   X1 X2
1  40  1
2  33  2
3  33  3
4  30  4
5  28  5
6  17  6
7  17  7
8  14  8
9  13  9
10  6 10
11  3 11
12  3 12
13  3 13

[[7]]
   X1 X2
1  40  1
2  33  2
3  33  3
4  30  4
5  28  5
6  17  6
7  17  7
8  14  8
9  13  9
10  6 10
11  3 11
12  3 12
13  3 13

[[8]]
   X1 X2
1  40  1
2  33  2
3  33  3
4  30  4
5  28  5
6  17  6
7  17  7
8  14  8
9  13  9
10  6 10
11  3 11
12  3 12
13  3 13

[[9]]
   X1 X2
1  40  1
2  33  2
3  33  3
4  30  4
5  28  5
6  17  6
7  17  7
8  14  8
9  13  9
10  6 10
11  3 11
12  3 12
13  3 13

[[10]]
   X1 X2
1  39  1
2  36  2
3  34  3
4  32  4
5  29  5
6  25  6
7  15  7
8  14  8
9  12  9
10  4 10

此函数为每个观察创建一个具有相同df$Game.ID 的所有其他观察的子集,然后根据df$MP 按降序排列所述观察。

我需要帮助提取与lapply() 操作的观察相对应的观察排名。我可以通过在我的函数中标识df3 的第二列来参考排名:

> f1 <- function(col1, df, col2)
+   lapply(col1, function(i)
+     df2 <- df[col1 == i, col2]
+     df3 <- data.frame(cbind(sort(df2, decreasing = TRUE), rownames(data.table(sort(df2, decreasing = TRUE)))))
+     df3[ , 2]
+   )
+   
> f1(df$Game.ID, df, c('MP'))[1:10]
[[1]]
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 1 10 2 3 4 5 6 7 8 9

[[2]]
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 1 10 2 3 4 5 6 7 8 9

[[3]]
 [1] 1  2  3  4  5  6  7  8  9  10 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[4]]
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 1 10 2 3 4 5 6 7 8 9

[[5]]
 [1] 1  2  3  4  5  6  7  8  9  10 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[6]]
 [1] 1  2  3  4  5  6  7  8  9  10 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[7]]
 [1] 1  2  3  4  5  6  7  8  9  10 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[8]]
 [1] 1  2  3  4  5  6  7  8  9  10 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[9]]
 [1] 1  2  3  4  5  6  7  8  9  10 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[10]]
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 1 10 2 3 4 5 6 7 8 9

但是我怎样才能只提取与每个观察对应的行呢?因此,例如,我想从df 的前 10 个观察值中提取的排名分别为 5、4、11、3、8、11、1、6、9 和 8。

我试过在lapply()i的函数中使用变量来引用被操作的行;但是,它似乎只引用了分配给它的df$Game.ID,而不是它的整行。

> f1 <- function(col1, df, col2)
+   col4 <- lapply(col1, function(i)
+     df2 <- df[col1 == i, col2]
+     df3 <- data.frame(cbind(sort(df2, decreasing = TRUE), rownames(data.table(sort(df2, decreasing = TRUE)))))
+     df3[i, 2]
+   )
+   
> f1(df$Game.ID, df, c('MP'))[1:10]
[[1]]
[1] 7
Levels: 1 10 2 3 4 5 6 7 8 9

[[2]]
[1] 7
Levels: 1 10 2 3 4 5 6 7 8 9

[[3]]
[1] 6
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[4]]
[1] 7
Levels: 1 10 2 3 4 5 6 7 8 9

[[5]]
[1] 6
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[6]]
[1] 6
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[7]]
[1] 6
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[8]]
[1] 6
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[9]]
[1] 6
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[10]]
[1] 7
Levels: 1 10 2 3 4 5 6 7 8 9

> f1 <- function(col1, df, col2)
+   col4 <- lapply(col1, function(i)
+     df2 <- df[col1 == i, col2]
+     df3 <- data.frame(cbind(sort(df2, decreasing = TRUE), rownames(data.table(sort(df2, decreasing = TRUE)))))
+     df3[df3$X1 == df[col1 == i, col2], 2]
+   )
+   
> f1(df$Game.ID, df, c('MP'))[1:10]
[[1]]
[1] 3
Levels: 1 10 2 3 4 5 6 7 8 9

[[2]]
[1] 3
Levels: 1 10 2 3 4 5 6 7 8 9

[[3]]
factor(0)
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[4]]
[1] 3
Levels: 1 10 2 3 4 5 6 7 8 9

[[5]]
factor(0)
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[6]]
factor(0)
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[7]]
factor(0)
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[8]]
factor(0)
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[9]]
factor(0)
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[10]]
[1] 3
Levels: 1 10 2 3 4 5 6 7 8 9

> f1 <- function(col1, df, col2)
+   col4 <- lapply(col1, function(i)
+     df2 <- df[col1 == i, col2]
+     df3 <- data.frame(cbind(sort(df2, decreasing = TRUE), rownames(data.table(sort(df2, decreasing = TRUE)))))
+     df3[df3$X1 == df[i, col2], 2]
+   )
+   
> f1(df$Game.ID, df, c('MP'))[1:10]
[[1]]
factor(0)
Levels: 1 10 2 3 4 5 6 7 8 9

[[2]]
factor(0)
Levels: 1 10 2 3 4 5 6 7 8 9

[[3]]
[1] 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[4]]
factor(0)
Levels: 1 10 2 3 4 5 6 7 8 9

[[5]]
[1] 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[6]]
[1] 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[7]]
[1] 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[8]]
[1] 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[9]]
[1] 11 12 13
Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9

[[10]]
factor(0)
Levels: 1 10 2 3 4 5 6 7 8 9

如何引用lapply() 正在操作的同一行中的另一列?

【问题讨论】:

我认为您应该编辑您的问题并使其更简洁。 两个案例是我能得到的最少的。将我的样本数据集细化为df$Game.ID 的一个级别,没有变化,这将违背我询问如何选择它的目的。 【参考方案1】:

假设我们需要sorted 值中的index,请在sort 中使用index.return = TRUE

f1 <- function(col1, df, col2)
 lapply(col1, function(i)
  df2 <- df[col1 == i, col2]
   sort(df2, decreasing = TRUE, index.return = TRUE)

 )
 
f1(df$Game.ID, df, 'MP')[1:2]

“Game.ID”中有重复的元素。因此,最好将unique 元素传递到函数中或在函数内更改

f1(unique(df$Game.ID), df, 'MP')[1:2]

因为它是factor,所以更有效的是传递levels

res <- f1(levels(df$Game.ID), df, 'MP')
names(res) <- levels(df$Game.ID)
res[1:2]
#$`2018-02-01 * DEN`
#$`2018-02-01 * DEN`$x
#[1] 33 29  4

#$`2018-02-01 * DEN`$ix
#[1] 3 1 2


#$`2018-02-01 * DET`
#$`2018-02-01 * DET`$x
#[1] 39 36 32

#$`2018-02-01 * DET`$ix
#[1] 3 2 1

在输出中,ix 是索引,'x' 是排序后的值

【讨论】:

【参考方案2】:

您可以尝试使用此功能,以避免必须多次计算每个Game.ID 的相同排名。新列rn显示条目的原始行号

f2 <- function(df) 
    require(dplyr)
    temp <- dplyr::mutate(df, rn = row_number())
    dplyr::arrange(temp, Game.ID, desc(MP)) %>%
        split(., .$Game.ID, drop=TRUE)  

f2(df)

# $`2018-02-01 * MIN`
   # MP Player.ID          Game.ID rn
# 1  40 butleji01 2018-02-01 * MIN  7
# 2  33 gibsota01 2018-02-01 * MIN 14
# 3  33 townska01 2018-02-01 * MIN 21
# 4  30 wiggian01 2018-02-01 * MIN 23
# 5  28 teaguje01 2018-02-01 * MIN 20
# 6  17 crawfja01 2018-02-01 * MIN  8
# 7  17 jonesty01 2018-02-01 * MIN 17
# 8  14 bjeline01 2018-02-01 * MIN  5
# 9  13 dienggo01 2018-02-01 * MIN  9
# 10  6 georgma01 2018-02-01 * MIN 13
# 11  3 aldrico01 2018-02-01 * MIN  3
# 12  3 ***aa01 2018-02-01 * MIN  6
# 13  3 muhamsh01 2018-02-01 * MIN 18

# $`2018-02-01 * OKC`
   # MP Player.ID          Game.ID rn
# 14 39 westbru01 2018-02-01 * OKC 22
# 15 36 georgpa01 2018-02-01 * OKC 12
# 16 34 anthoca01 2018-02-01 * OKC  4
# 17 32 adamsst01 2018-02-01 * OKC  2
# 18 29 abrinal01 2018-02-01 * OKC  1
# 19 25 grantje01 2018-02-01 * OKC 15
# 20 15 pattepa01 2018-02-01 * OKC 19
# 21 14 feltora01 2018-02-01 * OKC 10
# 22 12 huestjo01 2018-02-01 * OKC 16
# 23  4 fergute01 2018-02-01 * OKC 11

【讨论】:

【参考方案3】:

感谢您的帮助。我可以使用library(data.table) 为每个观察值调用适当的等级。

> setDT(df)
> df[, Depth.Chart := rank(-MP), by = Game.ID]
> df
    MP Player.ID          Game.ID Depth.Chart
 1: 29 abrinal01 2018-02-01 * OKC         5.0
 2: 32 adamsst01 2018-02-01 * OKC         4.0
 3:  3 aldrico01 2018-02-01 * MIN        12.0
 4: 34 anthoca01 2018-02-01 * OKC         3.0
 5: 14 bjeline01 2018-02-01 * MIN         8.0
 6:  3 ***aa01 2018-02-01 * MIN        12.0
 7: 40 butleji01 2018-02-01 * MIN         1.0
 8: 17 crawfja01 2018-02-01 * MIN         6.5
 9: 13 dienggo01 2018-02-01 * MIN         9.0
10: 14 feltora01 2018-02-01 * OKC         8.0
11:  4 fergute01 2018-02-01 * OKC        10.0
12: 36 georgpa01 2018-02-01 * OKC         2.0
13:  6 georgma01 2018-02-01 * MIN        10.0
14: 33 gibsota01 2018-02-01 * MIN         2.5
15: 25 grantje01 2018-02-01 * OKC         6.0
16: 12 huestjo01 2018-02-01 * OKC         9.0
17: 17 jonesty01 2018-02-01 * MIN         6.5
18:  3 muhamsh01 2018-02-01 * MIN        12.0
19: 15 pattepa01 2018-02-01 * OKC         7.0
20: 28 teaguje01 2018-02-01 * MIN         5.0
21: 33 townska01 2018-02-01 * MIN         2.5
22: 39 westbru01 2018-02-01 * OKC         1.0
23: 30 wiggian01 2018-02-01 * MIN         4.0
    MP Player.ID          Game.ID Depth.Chart

【讨论】:

以上是关于如何在 R 的 lapply() 中引用正在操作的行的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 lapply 在 R 中批量处理 geoTIFF

r 中的 T 检验:如何使用 lapply 函数更改 t 检验的 x 和 y 参数

如何在 lapply (R) 中显式定义附加参数

如何将 lapply 的输出保存(分配)到 R 中的单个变量中?

如何使用lapply来计算r中列表中的唯一值

如何在 SparkR 中将额外的参数传递给 spark.lapply?