如何计算数据通过R中某个阈值的次数(频率)?

Posted

技术标签:

【中文标题】如何计算数据通过R中某个阈值的次数(频率)?【英文标题】:How to count the amount of times (frequency) that data passes through a certain threshold in R? 【发布时间】:2021-12-02 15:28:07 【问题描述】:

我正在尝试计算我的数据(时间序列上的一条线)在 R 中越过某个阈值的次数,最好使用dplyr。对于这个例子,我试图弄清楚我的数据有多少次超过 50m 的阈值。

所以在这种情况下,它会超过 50m 阈值的 3 倍。我的数据框有数百万个点长,因此手动执行此操作不是一种选择。任何帮助将不胜感激!

输入数据样本:

structure(list(Date.time = structure(c(1458626300, 1458626310, 
1458626320, 1458626330, 1458626340, 1458626350, 1458626360, 1458626370, 
1458626380, 1458626390, 1458626400, 1458626410, 1458626420, 1458626430, 
1458626440, 1458626450, 1458626460, 1458626470, 1458626480, 1458626490, 
1458626500, 1458626510, 1458626520, 1458626530, 1458626540, 1458627840, 
1458627850, 1458627860, 1458627870, 1458627880, 1458627890, 1458627900, 
1458627910, 1458627920, 1458627930, 1458627940, 1458627950, 1458627960, 
1458627970, 1458627980, 1458627990, 1458628000, 1458628010, 1458628020, 
1458628030, 1458628040, 1458628050, 1458628060, 1458628070, 1458628080, 
1458628090, 1458628100, 1458628110, 1458628120, 1458628130, 1458628140, 
1458628150, 1458628160, 1458630830, 1458630840, 1458630850, 1458630860, 
1458630870, 1458630880, 1458630890, 1458630900, 1458630910, 1458630920, 
1458630930, 1458630940), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), Depth = c(67.5, 66, 63.5, 62, 59.5, 57.5, 56, 54.5, 
53.5, 52.5, 51, 49.5, 48, 46.5, 45, 44, 42, 40.5, 37.5, 35, 33, 
33, 32.5, 30, 27, 26.5, 28, 29, 31.5, 33, 32.5, 32.5, 32.5, 32, 
32, 32.5, 33, 34.5, 36.5, 38.5, 40.5, 41.5, 43, 44.5, 46.5, 49.5, 
51.5, 53, 54.5, 56, 58, 59, 60.5, 62, 64.5, 66, 67, 69, 69.5, 
67, 64, 62.5, 60, 57, 55.5, 51.5, 48.5, 46, 43, 40.5)), row.names = c(NA, 
-70L), class = c("data.table", "data.frame")

【问题讨论】:

是否可以按条件depth >= threshold来统计? 【参考方案1】:

您可以使用data.tablerleid 来识别哪些数据组低于50,哪些高于。如果然后从组数中减去 1,则得到数据通过阈值的次数:

library(data.table)
library(dplyr)

df %>%
  summarize(rleid(Depth > 50)) %>%
  n_distinct()-1

【讨论】:

【参考方案2】:

dplyr 也应该可以解决问题:

您基本上检查Depth 的当前值和下一个(使用lead)在阈值的两侧,然后计算此条件匹配的次数。

library(dplyr)

df <- 
  tibble(
    Date.time = 
      structure(
        c(
          1458626300, 1458626310, 1458626320, 1458626330, 1458626340, 1458626350, 
          1458626360, 1458626370, 1458626380, 1458626390, 1458626400, 1458626410,
          1458626420, 1458626430, 1458626440, 1458626450, 1458626460, 1458626470, 
          1458626480, 1458626490, 1458626500, 1458626510, 1458626520, 1458626530, 
          1458626540, 1458627840, 1458627850, 1458627860, 1458627870, 1458627880, 
          1458627890, 1458627900, 1458627910, 1458627920, 1458627930, 1458627940, 
          1458627950, 1458627960, 1458627970, 1458627980, 1458627990, 1458628000, 
          1458628010, 1458628020, 1458628030, 1458628040, 1458628050, 1458628060,
          1458628070, 1458628080, 1458628090, 1458628100, 1458628110, 1458628120, 
          1458628130, 1458628140, 1458628150, 1458628160, 1458630830, 1458630840, 
          1458630850, 1458630860, 1458630870, 1458630880, 1458630890, 1458630900, 
          1458630910, 1458630920, 1458630930, 1458630940
        ), 
        tzone = "UTC", class = c("POSIXct", "POSIXt")
      ), 
    Depth = 
      c(
        67.5, 66, 63.5, 62, 59.5, 57.5, 56, 54.5, 53.5, 52.5, 51, 49.5, 48, 
        46.5, 45, 44, 42, 40.5, 37.5, 35, 33, 33, 32.5, 30, 27, 26.5, 28, 29, 
        31.5, 33, 32.5, 32.5, 32.5, 32, 32, 32.5, 33, 34.5, 36.5, 38.5, 40.5, 
        41.5, 43, 44.5, 46.5, 49.5, 51.5, 53, 54.5, 56, 58, 59, 60.5, 62, 64.5, 
        66, 67, 69, 69.5, 67, 64, 62.5, 60, 57, 55.5, 51.5, 48.5, 46, 43, 40.5
      )
  )

## define the threshold
threshold <- 50

## keep only date just before threshold is crossed
df.cross <- 
  df %>%
  filter(
    ## cross the treshold with positive trend
    (Depth <= threshold & lead(Depth) > threshold) |
      ## or cross the treshold with negative trend
      (Depth >= threshold & lead(Depth) < threshold)
  )

df.cross

## count the number of time threshold is crossed
nrow(df.cross)

这应该给你:

> df.cross
# A tibble: 3 × 2
  Date.time           Depth
  <dttm>              <dbl>
1 2016-03-22 06:00:00  51  
2 2016-03-22 06:27:20  49.5
3 2016-03-22 07:15:00  51.5
> ## count the number of time threshold is crossed
> nrow(df.cross)
[1] 3

【讨论】:

【参考方案3】:

这是dplyr 方式。

# dat1 is your data
thre <- 50
dat2 <- dat1 %>%
  as_tibble() %>%
  mutate(a1 = Depth > thre,
         a2 = lag(Depth) < thre,
         cross = a1 == a2)
sum(dat2$cross, na.rm = T)

【讨论】:

【参考方案4】:

Tidyverse 解决方案:

# Load the tidyverse package: 
library(tidyverse)

# Set the threshold: th => integer scalar
th <- 50

# Calculate the number of times the threshold has been crossed:
# integer scalar => stdout(console)
df %>% 
   summarise(cnt = sum(
      Depth >= th & lag(Depth) < th | Depth < th & lag(Depth) >= th,
      na.rm = TRUE
   )
) %>% 
   pull(cnt)

基础 R 解决方案:

# Set the threshold: th => integer scalar
th <- 50

# Calculate the number of times the threshold is crossed:
# integer scalar => stdout(console)
with(
   df, 
   sum(Depth[-nrow(df)] >= th & Depth[-1] < th |
      Depth[-nrow(df)] < th & Depth[-1] >= th)
)

【讨论】:

以上是关于如何计算数据通过R中某个阈值的次数(频率)?的主要内容,如果未能解决你的问题,请参考以下文章

如何让我的 iPhone 收听高于某个阈值的声音频率?

如何计算R中数据框中字符串中“c(\”)的出现次数?

R计算数据框中的百分比值

分类数据和连续数据在逻辑回归中如何设值

Naudio - 计算音频电平上升到一个电平以上的次数

如何通过频率来计算cpu的浮点运算能力滴?