根据R中的条件计算日期之间的平均差
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了根据R中的条件计算日期之间的平均差相关的知识,希望对你有一定的参考价值。
我有这个数据集:
Date New_Renew
2019-01-10 22:11:16 Renewing
2019-02-23 00:21:48 Renewing
2019-03-05 05:26:17 Renewing
2019-04-18 15:05:10 NEW
2019-04-18 15:07:52 NEW
2019-04-26 11:32:25 Renewing
2019-05-03 14:15:25 Renewing
2019-05-08 21:10:08 NEW
2019-05-16 13:35:57 Renewing
2019-05-24 13:18:23 Renewing
2019-06-01 12:42:21 Renewing
2019-06-17 18:08:09 Renewing
2019-06-26 13:40:29 Renewing
2019-12-13 17:57:43 Renewing
2020-01-03 11:49:14 Renewing
2020-01-11 11:46:51 Renewing
2020-01-14 21:08:08 NEW
2020-01-18 21:14:30 NEW
2020-01-21 16:08:37 NEW
2020-01-28 11:41:44 Renewing
2020-01-30 13:34:21 Renewing
2020-02-03 13:29:37 Renewing
2020-02-18 17:15:52 Renewing
2020-02-20 13:37:52 Renewing
2020-02-24 12:55:25 Renewing
2020-02-26 21:13:38 NEW
2020-03-04 13:23:41 Renewing
2020-03-09 16:48:36 Renewing
我想要的是,当New_Renew变量等于NEW时,计算与NEW相关联的日期之间的差的平均值。简而言之,用户多久执行一次新交易。
答案
library(data.table)
library(xts)
library(lubridate)
library(tbl2xts)
DT <- read.table(text = 'Date, New_Renew
2019-01-10 22:11:16,Renewing
2019-02-23 00:21:48,Renewing
2019-03-05 05:26:17,Renewing
2019-04-18 15:05:10,NEW
2019-04-18 15:07:52,NEW
2019-04-26 11:32:25,Renewing
2019-05-03 14:15:25,Renewing
2019-05-08 21:10:08,NEW
2019-05-16 13:35:57,Renewing
2019-05-24 13:18:23,Renewing
2019-06-01 12:42:21,Renewing
2019-06-17 18:08:09,Renewing
2019-06-26 13:40:29,Renewing
2019-12-13 17:57:43,Renewing
2020-01-03 11:49:14,Renewing
2020-01-11 11:46:51,Renewing
2020-01-14 21:08:08,NEW
2020-01-18 21:14:30,NEW
2020-01-21 16:08:37,NEW
2020-01-28 11:41:44,Renewing
2020-01-30 13:34:21,Renewing
2020-02-03 13:29:37,Renewing
2020-02-18 17:15:52,Renewing
2020-02-20 13:37:52,Renewing
2020-02-24 12:55:25,Renewing
2020-02-26 21:13:38,NEW
2020-03-04 13:23:41,Renewing
2020-03-09 16:48:36,Renewing',
sep = ',',
header = T)
df <- xts(DT, order.by = ymd_hms(DT$Date))
new_items <- which(DT$New_Renew=="NEW")
dif <- DT
dif$difference <- NA
renewal <- 0
for (i in 1:nrow(df)){
if (df[i,2]=='Renewing' & renewal == 0){
renewal <- i
} else if (df[i,2]=='Renewing' & renewal != 0){
next
} else if (df[i, 2]=='NEW' & renewal != 0) {
dif[i, 'difference'] <- index(df[i, 2]) - index(df[renewal, 2])
} else {
dif[i, 'difference'] <- index(df[i, 2]) - index(df[renewal, 2])
renewal <- 0
}
}
mean_diff <- mean(dif$difference, na.rm = T)
另一答案
使用aggregate
和diff
。 60*24
将产生秒数转换为天数。
aggregate(Date ~ New_Renew, dat, function(x) mean(diff(x))/(60*24))
# New_Renew Date
# 1 NEW 52.38292438
# 2 Renewing 0.01471444
以上是关于根据R中的条件计算日期之间的平均差的主要内容,如果未能解决你的问题,请参考以下文章