根据序列中的位置分配新值

Posted 2023-03-14

技术标签:

【中文标题】根据序列中的位置分配新值【英文标题】：assigning new values based on the location in the sequence 【发布时间】：2013-07-24 04:52:23 【问题描述】：

在 R 中工作。这些数据跟踪大脑活动随时间的变化。 “标记”列包含特定治疗开始和结束的信息。例如，第一个条件（mark==1）从第 3 行开始，到第 6 行结束。第二个实验条件（mark==2）从第 9 行开始，到第 12 行结束。另一批处理一个在行之间重复15 和 18。

ob.id <- c(1:20)
mark <- c(0,0,1,0,0,1,0,0,2,0,0,2,0,0,1,0,0,1,0,0)
condition<-c(0,0,1,1,1,1,0,0,2,2,2,2,0,0,1, 1,1,1,0,0)
start <- data.frame(ob.id,mark)
result<-data.frame(ob.id,mark,condition)
print (start)
> print (start)
   ob.id mark
1      1    0
2      2    0
3      3    1
4      4    0
5      5    0
6      6    1
7      7    0
8      8    0
9      9    2
10    10    0
11    11    0
12    12    2
13    13    0
14    14    0
15    15    1
16    16    0
17    17    0
18    18    1
19    19    0
20    20    0

我需要创建一个列，其中包含一个虚拟变量，指示相应实验条件下观察的成员资格，如下所示：

> print(result)
   ob.id mark condition
1      1    0         0
2      2    0         0
3      3    1         1
4      4    0         1
5      5    0         1
6      6    1         1
7      7    0         0
8      8    0         0
9      9    2         2
10    10    0         2
11    11    0         2
12    12    2         2
13    13    0         0
14    14    0         0
15    15    1         1
16    16    0         1
17    17    0         1
18    18    1         1
19    19    0         0
20    20    0         0

感谢您的帮助！

【问题讨论】：

实验总是4行吗？不，实验条件会有不同的长度：它是呈现特定刺激的毫秒数。 【参考方案1】：

这是我能想到的一种方法：

#  Find where experiments stop and start
ind <- which( result$mark != 0 )
[1]  3  6  9 12 15 18

#  Make a matrix of the start and stop indices taking odd and even elements of the vector
idx <- cbind( head(ind , -1)[ 1:length(ind) %% 2 == 1 ] ,tail( ind , -1)[ 1:length(ind) %% 2 == 1 ] )
     [,1] [,2]
[1,]    3    6
[2,]    9   12
[3,]   15   18

编辑

我意识到只使用奇数和偶数元素来制作上述索引矩阵会更容易：

idx <- cbind( ind[ 1:length(ind) %% 2 == 1 ] , ind[ 1:length(ind) %% 2 != 1 ] )


#  Make vector of row indices to turn to 1's
ones <- as.vector( apply( idx , 1 , function(x) c( x[1]:x[2] ) ) )

#  Make your new column and turn appropriate rows to 1
result$condition <- 0
result$condition[ ones ] <- 1
result
#   ob.id mark condition
#1      1    0         0
#2      2    0         0
#3      3    1         1
#4      4    1         1
#5      5    1         1
#6      6    1         1
#7      7    0         0
#8      8    0         0
#9      9    1         1
#10    10    1         1
#11    11    1         1
#12    12    1         1
#13    13    0         0
#14    14    0         0
#15    15    1         1
#16    16    1         1
#17    17    1         1
#18    18    1         1
#19    19    0         0
#20    20    0         0

编辑

@eddi 指出我需要输入实验的价值，而不仅仅是一个。所以这是另一个使用 gasp(!) for 循环的策略。如果您有 ~~millions~~ 数千个实验（请记住预先分配结果向量），这只会是真正有害的：

ind <- matrix( which( start$mark != 0 ) , ncol = 2 , byrow = TRUE )
ind <- cbind( ind , start$mark[ ind[ , 1 ] ] )
#     [,1] [,2] [,3]
#[1,]    3    6    1
#[2,]    9   12    2
#[3,]   15   18    1

res <- integer( nrow( start ) )

for( i in 1:nrow(ind) )
  res[ ind[i,1]:ind[i,2] ] <- ind[i,3]

[1] 0 0 1 1 1 1 0 0 2 2 2 2 0 0 1 1 1 1 0 0

【讨论】：

我仍在阅读您的解决方案，但这是第 2 步的一个更简单的选项matrix(ind, ncol = 2, byrow = T) +1 是一种有趣的方法，但您需要以某种方式解决此问题以填充不只是 1 的好东西，@SimonO101。我最终使用了 eddi 的解决方案来解决我的特定问题，但这给了我一些很好的演示和未来参考材料。感谢您的宝贵时间！【参考方案2】：

这是一个有趣的小问题。我在下面使用的技巧是首先计算mark 向量的rle，这使问题变得更简单，因为生成的values 向量总是只有一个可能需要或不需要替换的0（取决于周围的值）。

# example vector with some edge cases
v = c(0,0,1,0,0,0,1,2,0,0,2,0,0,1,0,0,0,0,1,2,0,2)

v.rle = rle(v)
v.rle
#Run Length Encoding
#  lengths: int [1:14] 2 1 3 1 1 2 1 2 1 4 ...
#  values : num [1:14] 0 1 0 1 2 0 2 0 1 0 ...

vals = rle(v)$values

# find the 0's that need to be replaced and replace by the previous value
idx = which(tail(head(vals,-1),-1) == 0 & (head(vals,-2) == tail(vals,-2)))
vals[idx + 1] <- vals[idx]

# finally go back to the original vector
v.rle$values = vals
inverse.rle(v.rle)
# [1] 0 0 1 1 1 1 1 2 2 2 2 0 0 1 1 1 1 1 1 2 2 2

可能最不麻烦的事情是将上述内容放在一个函数中，然后将其应用于您的 data.frame 向量（而不是显式操作向量）。

另一种方法，基于 @SimonO101 的观察，涉及从起始数据构建正确的组（单独运行 by 部分，逐个运行，看看它是如何工作的）：

library(data.table)
dt = data.table(start)

dt[, result := mark[1],
     by = tmp = rep(0, length(mark));
           tmp[which(mark != 0)[c(F,T)]] = 1;
           cumsum(mark != 0) - tmp]
dt
#    ob.id mark result
# 1:     1    0      0
# 2:     2    0      0
# 3:     3    1      1
# 4:     4    0      1
# 5:     5    0      1
# 6:     6    1      1
# 7:     7    0      0
# 8:     8    0      0
# 9:     9    2      2
#10:    10    0      2
#11:    11    0      2
#12:    12    2      2
#13:    13    0      0
#14:    14    0      0
#15:    15    1      1
#16:    16    0      1
#17:    17    0      1
#18:    18    1      1
#19:    19    0      0
#20:    20    0      0

后一种方法可能会更灵活。

【讨论】：

+1 用于正面和反面的技巧。这也是我的想法。 @eddi：tail/head 行中的t 值是多少？感谢@eddi，您的 rle 解决方案对我来说更加透明和有教育意义，但我同意后一个更灵活（感谢 @SimonO101！）。这正是我所需要的，而且还展示了我希望看到的此类问题示例的条件逻辑。

以上是关于根据序列中的位置分配新值的主要内容，如果未能解决你的问题，请参考以下文章