如何计算连续行的时差
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何计算连续行的时差相关的知识,希望对你有一定的参考价值。
原始数据看起来像这样,我想按访问者和时间对其进行排序,以计算行中的时差,然后将其保存到新文件中。
visitor v_time payment items
1 Jack 1/2/2018 16:07 35 3
2 Jack 1/2/2018 16:09 160 1
3 David 1/2/2018 16:12 25 2
4 Kate 1/2/2018 16:16 3 3
5 David 1/2/2018 16:21 25 5
6 Jack 1/2/2018 16:32 85 5
7 Kate 1/2/2018 16:33 639 3
8 Jack 1/2/2018 16:55 6 2
分组和排序都可以。但它无法计算时差,也无法计算文件。
visitor <- c("Jack", "Jack", "David", "Kate", "David", "Jack", "Kate", "Jack")
v_time <- c("1/2/2018 16:07","1/2/2018 16:09","1/2/2018 16:12","1/2/2018 16:16","1/2/2018 16:21","1/2/2018 16:32","1/2/2018 16:33", "1/2/2018 16:55")
payment <- c(35,160,25,3,25,85,639,6)
items <- c(3,1,2,3,5,5,3,2)
df <- data.frame(visitor, v_time, payment, items)
df %>%
arrange(visitor, v_time) %>%
group_by(visitor) %>%
mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M")), diff_secs = as.numeric(diff, units = 'secs'))
write.csv(df,"C:/output.csv", row.names = F)
我的错误和正确的做法是什么?
# A tibble: 8 x 6
# Groups: visitor [3]
visitor v_time payment items diff diff_secs
<fct> <fct> <dbl> <dbl> <time> <dbl>
1 David 1/2/2018 16:12 25.0 2.00 NA NA
2 David 1/2/2018 16:21 25.0 5.00 NA NA
3 Jack 1/2/2018 16:07 35.0 3.00 NA NA
4 Jack 1/2/2018 16:09 160 1.00 NA NA
5 Jack 1/2/2018 16:32 85.0 5.00 NA NA
6 Jack 1/2/2018 16:55 6.00 2.00 NA NA
7 Kate 1/2/2018 16:16 3.00 3.00 NA NA
8 Kate 1/2/2018 16:33 639 3.00 NA NA
答案
当你只是将default = strptime(v_time, "%d/%m/%Y %H:%M")[1]
添加到lag
部分时:
df <- df %>%
arrange(visitor, v_time) %>%
group_by(visitor) %>%
mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]),
diff_secs = as.numeric(diff, units = 'secs'))
你得到了你期望的结果:
> df # A tibble: 8 x 6 # Groups: visitor [3] visitor v_time payment items diff diff_secs <fct> <fct> <dbl> <dbl> <time> <dbl> 1 David 1/2/2018 16:12 25. 2. 0 0. 2 David 1/2/2018 16:21 25. 5. 540 540. 3 Jack 1/2/2018 16:07 35. 3. 0 0. 4 Jack 1/2/2018 16:09 160. 1. 120 120. 5 Jack 1/2/2018 16:32 85. 5. 1380 1380. 6 Jack 1/2/2018 16:55 6. 2. 1380 1380. 7 Kate 1/2/2018 16:16 3. 3. 0 0. 8 Kate 1/2/2018 16:33 639. 3. 1020 1020.
另一种选择是使用difftime
:
df <- df %>%
arrange(visitor, v_time) %>%
group_by(visitor) %>%
mutate(diff = difftime(strptime(v_time, "%d/%m/%Y %H:%M"), lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]), units = 'mins'),
diff_secs = as.numeric(diff, units = 'secs'))
现在diff
-column是几分钟,而diff_sec
-column是几秒钟:
> df # A tibble: 8 x 6 # Groups: visitor [3] visitor v_time payment items diff diff_secs <fct> <fct> <dbl> <dbl> <time> <dbl> 1 David 1/2/2018 16:12 25. 2. 0 0. 2 David 1/2/2018 16:21 25. 5. 9 540. 3 Jack 1/2/2018 16:07 35. 3. 0 0. 4 Jack 1/2/2018 16:09 160. 1. 2 120. 5 Jack 1/2/2018 16:32 85. 5. 23 1380. 6 Jack 1/2/2018 16:55 6. 2. 23 1380. 7 Kate 1/2/2018 16:16 3. 3. 0 0. 8 Kate 1/2/2018 16:33 639. 3. 17 1020.
您现在可以使用write.csv(df,"C:/output.csv", row.names = FALSE)
再次保存结果
另一答案
错误来自lag(strptime(v_time, "%d/%m/%Y %H:%M"))
错误信息:
# Error in format.POSIXlt(x, usetz = TRUE) :
# invalid component [[10]] in "POSIXlt" should be 'zone'
要避免这种情况,请尝试strptime(lag(v_time), "%d/%m/%Y %H:%M")
df <- df %>%
arrange(visitor, v_time) %>%
group_by(visitor) %>%
mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - strptime(lag(v_time), "%d/%m/%Y %H:%M"), diff_secs = as.numeric(diff, units = 'secs'))
print(df)
输出:
# A tibble: 8 x 6
# Groups: visitor [3]
visitor v_time payment items diff diff_secs
<fctr> <fctr> <dbl> <dbl> <time> <dbl>
1 David 1/2/2018 16:12 25 2 NA mins NA
2 David 1/2/2018 16:21 25 5 9 mins 540
3 Jack 1/2/2018 16:07 35 3 NA mins NA
4 Jack 1/2/2018 16:09 160 1 2 mins 120
5 Jack 1/2/2018 16:32 85 5 23 mins 1380
6 Jack 1/2/2018 16:55 6 2 23 mins 1380
7 Kate 1/2/2018 16:16 3 3 NA mins NA
8 Kate 1/2/2018 16:33 639 3 17 mins 1020
在导出之前,不要忘记使用df
保存df <-
上的工作。
另一答案
这是lubridate
包的方法
library(lubridate)
df$v_time <- mdy_hm(df$v_time)
df <- df %>%
arrange(visitor, v_time) %>%
group_by(visitor)
df$diff <- rep(0,nrow(df))
for(i in 1:(nrow(df)-1)){
df$diff[i+1] <- df$v_time[i+1]-df$v_time[i]
}
write.csv(df,"C:/output.csv", row.names = F)
另一答案
这是difftime
的一个选项。我们使用dmy_hm
(来自lubridate
)将'v_time'转换为datetime,然后在arrange
ing之后,并通过'visitor'进行分组,在几秒钟内将difftime
用于输出
library(tidyverse)
out <- df %>%
mutate(v_time = dmy_hm(v_time)) %>%
arrange(visitor, v_time) %>%
group_by(visitor) %>%
mutate(diff = difftime(v_time, lag(v_time, default = first(v_time)), units = "secs"))
# A tibble: 8 x 5
# Groups: visitor [3]
# visitor v_time payment items diff
# <fctr> <dttm> <dbl> <dbl> <time>
#1 David 2018-02-01 16:12:00 25.0 2.00 0
#2 David 2018-02-01 16:21:00 25.0 5.00 540
#3 Jack 2018-02-01 16:07:00 35.0 3.00 0
#4 Jack 2018-02-01 16:09:00 160 1.00 120
#5 Jack 2018-02-01 16:32:00 85.0 5.00 1380
#6 Jack 2018-02-01 16:55:00 6.00 2.00 1380
#7 Kate 2018-02-01 16:16:00 3.00 3.00 0
#8 Kate 2018-02-01 16:33:00 639 3.00 1020
然后,我们用write_csv
写csv
write_csv(out, "yourfile.csv")
以上是关于如何计算连续行的时差的主要内容,如果未能解决你的问题,请参考以下文章