R脚本每<x>天取平均值
Posted
技术标签:
【中文标题】R脚本每<x>天取平均值【英文标题】:R Script to average value over every <x> days 【发布时间】:2011-02-24 14:25:09 【问题描述】:我在找出如何计算“x”天的平均值时遇到了问题。如果我尝试在 1 年内绘制此 csv 文件,则数据太多而无法在绘图线上正确显示(附截图)。我希望每隔几天(可能是 2 天、一周等)对数据进行平均,因此折线图并不难阅读。关于如何用 R 解决这个问题的任何建议?
结果.csv
POSTS,PROVIDER,TYPE,DATE
29337,FTP,BLOG,2010-01-01
26725,FTP,BLOG,2010-01-02
27480,FTP,BLOG,2010-01-03
31187,FTP,BLOG,2010-01-04
31488,FTP,BLOG,2010-01-05
32461,FTP,BLOG,2010-01-06
33675,FTP,BLOG,2010-01-07
38897,FTP,BLOG,2010-01-08
37122,FTP,BLOG,2010-01-09
41365,FTP,BLOG,2010-01-10
51760,FTP,BLOG,2010-01-11
50859,FTP,BLOG,2010-01-12
53765,FTP,BLOG,2010-01-13
56836,FTP,BLOG,2010-01-14
59698,FTP,BLOG,2010-01-15
52095,FTP,BLOG,2010-01-16
57154,FTP,BLOG,2010-01-17
80755,FTP,BLOG,2010-01-18
227464,FTP,BLOG,2010-01-19
394510,FTP,BLOG,2010-01-20
371303,FTP,BLOG,2010-01-21
370450,FTP,BLOG,2010-01-22
268703,FTP,BLOG,2010-01-23
267252,FTP,BLOG,2010-01-24
375712,FTP,BLOG,2010-01-25
381041,FTP,BLOG,2010-01-26
380948,FTP,BLOG,2010-01-27
373140,FTP,BLOG,2010-01-28
361874,FTP,BLOG,2010-01-29
265178,FTP,BLOG,2010-01-30
269929,FTP,BLOG,2010-01-31
R 脚本
library(ggplot2);
data <- read.csv("results.csv", header=T);
dts <- as.POSIXct(data$DATE, format="%Y-%m-%d");
attach(data);
a <- ggplot(dataframe, aes(dts,POSTS/1000, fill = TYPE)) + opts(title = "Report") + labs(x = NULL, y = "Posts (k)", fill = NULL);
b <- a + geom_bar(stat = "identity", position = "stack");
plot_theme <- theme_update(axis.text.x = theme_text(angle=90, hjust=1), panel.grid.major = theme_line(colour = "grey90"), panel.grid.minor = theme_blank(), panel.background = theme_blank(), axis.ticks = theme_blank(), legend.position = "none");
c <- b + facet_grid(TYPE ~ ., scale = "free_y");
d <- c + scale_x_datetime(major = "1 months", format = "%Y %b");
ggsave(filename="/root/results.png",height=14,width=14,dpi=600);
图形图像
【问题讨论】:
你试过用geom_smooth
代替geom_bar
吗?
【参考方案1】:
试试这个:
Average <- function(Data,n)
# Make an index to be used for aggregating
ID <- as.numeric(as.factor(Data$DATE))-1
ID <- ID %/% n
# aggregate over ID and TYPE for all numeric data.
out <- aggregate(Data[sapply(Data,is.numeric)],
by=list(ID,Data$TYPE),
FUN=mean)
# format output
names(out)[1:2] <-c("dts","TYPE")
# add the correct dates as the beginning of every period
out$dts <- as.POSIXct(Data$DATE[(out$dts*n)+1])
out
dataframe <- Average(Data,3)
这适用于您提供的情节脚本。
一些备注:
永远不要在函数名(data、c、...)之后调用某个变量 避免使用attach()
。如果你这样做,请在之后添加detach()
,否则你会在某个时候遇到麻烦。更好的是使用函数with()
和within()
【讨论】:
感谢您的快速回复。这正是我所需要的。我会听从你的建议。 您可能想删除浏览器语句。【参考方案2】:TTR 包也有几个移动平均函数,它们可以用一条语句来完成:
library(TTR)
mavg.3day <- SMA(data$POSTS, n=3) # Simple moving average
用不同的 'n' 值代替你想要的移动平均长度。
【讨论】:
以上是关于R脚本每<x>天取平均值的主要内容,如果未能解决你的问题,请参考以下文章