基于索引的非重叠滑动窗口

Posted 2023-03-12

技术标签:

【中文标题】基于索引的非重叠滑动窗口【英文标题】：Non-overlapping sliding window based on index 【发布时间】：2021-12-17 03:32:17 【问题描述】：

对于具有索引列和值列的data.frame，df，我想计算例如非重叠滑动窗口中值的平均值，窗口大小基于索引中的单位列（例如，在索引中覆盖 10 个单位的窗口）。

runner::runner 和 slider::slide_index 允许您根据索引列在窗口中滑动，但我看不到使窗口不重叠的方法。

df = structure(list(V3 = c(17054720L, 17075353L, 17087656L, 17099107L, 
17152611L, 17154984L, 17178213L, 17256231L, 17264565L, 17280822L, 
17281931L, 17285949L, 17289118L, 17294251L, 17301217L, 17301843L, 
17304246L, 17304887L, 17306104L, 17310741L, 17312596L, 17315102L, 
17315503L, 17317233L, 17318150L, 17319156L, 17326181L, 17326432L, 
17394989L, 17395610L, 17396612L, 17397875L, 17398508L, 17398800L, 
17398812L, 17399211L, 17405173L, 17407349L, 17407566L, 17409897L, 
17410373L, 17412216L, 17412806L, 17414103L, 17414640L, 17415572L, 
17426401L, 17427037L, 17429384L, 17429434L, 17433210L, 17434084L, 
17436846L, 17441524L, 17442154L, 17443131L, 17445502L, 17446157L, 
17446914L, 17450515L, 17452966L, 17462185L, 17467411L, 17467684L, 
17470779L, 17475921L, 17488195L, 17489577L, 17489890L, 17490932L, 
17492203L, 17492452L, 17493792L, 17494101L, 17494547L, 17524203L, 
17525584L, 17525970L, 17529814L, 17541673L, 17545859L, 17557144L, 
17567699L, 17575800L, 17580394L, 17580813L, 17585441L, 17586471L, 
17587680L, 17587975L, 17589209L, 17589246L, 17593685L, 17594915L, 
17597462L, 17599844L, 17603801L, 17605824L, 17611515L, 17615213L
), V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L)), row.names = c(NA, -100L), class = "data.frame")

【问题讨论】：

你能发布几行你的预期结果 【参考方案1】：

这样的事情怎么样：

df <- data.frame(
  index = 0:99, 
  value = 1:100)

df %>% 
  mutate(window = floor(index/10)) %>% 
  group_by(window) %>% 
  summarise(value = mean(value), 
            n = n())
# # A tibble: 10 × 3
#   window value     n
#    <dbl> <dbl> <int>
# 1      0   5.5    10
# 2      1  15.5    10
# 3      2  25.5    10
# 4      3  35.5    10
# 5      4  45.5    10
# 6      5  55.5    10
# 7      6  65.5    10
# 8      7  75.5    10
# 9      8  85.5    10
# 10      9  95.5    10

在上面的答案中，您将索引除以窗口宽度并将其包装在 floor() 函数中，以便它将所有观察值向下舍入到最接近的整数。这假设索引值是连续的整数。如果索引不是顺序的，另一种方法是使其成为顺序，如下所示：

df <- data.frame(
  index = sample(0:1000, 100, replace=FALSE), 
  value = 1:100)

df %>% 
  arrange(index) %>% 
  mutate(obs = seq_along(index)-1, 
         window = floor(obs/10)) %>% 
  group_by(window) %>% 
  summarise(value = mean(value), 
            n = n())
# A tibble: 10 × 3
#   window value     n
#    <dbl> <dbl> <int>
# 1      0  38.2    10
# 2      1  50.1    10
# 3      2  63.6    10
# 4      3  64.9    10
# 5      4  44      10
# 6      5  41.5    10
# 7      6  65.4    10
# 8      7  45.1    10
# 9      8  48.9    10
# 10      9  43.3    10

【讨论】：

非常简洁，比使用其他方法快得多。干杯。

以上是关于基于索引的非重叠滑动窗口的主要内容，如果未能解决你的问题，请参考以下文章