统计web日志里面一个时间段的get请求数量

Posted 大葱拌豆腐

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了统计web日志里面一个时间段的get请求数量相关的知识,希望对你有一定的参考价值。

日志数据:

0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] "GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1" 200 13821
0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:32 +0800] "GET /CloudDocLib/xng/xngAction!listDeamons.action?page=0&count=10&sort=SYMBOL&order=asc&query=STYPE%3AEQA%3BCINDUSTRY.STYLE%3A009%3BCINDUSTRY.STYLECODE%3AZC7&jobListType=1&host=unknown HTTP/1.1" 200 332
0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:40 +0800] "POST /CloudDocLib/xng/xngAction!startDeamon.action HTTP/1.1" 200 132```
**要求:按照时间每个小时统计get产生的次数**
第一种做法是使用sql的做法:
scala代码:
import org.apache.Spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}

/**
* Created by xiaopengpeng on 2016/12/15.
*/
class countget {

}
object countget{
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)
val spark = SparkSession
.builder()
.config(conf)
.getOrCreate()
import spark.implicits._
//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821
val logDF = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.72\logs\localhost_access_log.2016-11-11.txt”)
//.foreach(x=>x.split(” “).map())
.map(line =>line.split(” “)).map(list=>( list(3).substring(list(3).lastIndexOf(“/”)+1,list(3).lastIndexOf(“/”)+8),list(5)))
.toDF(“time”,”method”);
logDF.show();
logDF.createOrReplaceTempView(“log”);
spark.sql(“SELECT time,COUNT(method) FROM log WHERE method=’\”GET’ group by time”).show();

}
}
第二种做法是用的纯粹的scala代码实现的
代码:
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

/**
* Created by root on 2016/12/15.
*/
class CountGetByScala {

}
object CountGetByScala{
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)
val spark = SparkSession
.builder()
.config(conf)
.getOrCreate()
import spark.implicits._
//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821
val logLine = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.72\logs\localhost_access_log.2016-11-11.txt”)
.map(line =>line.split(” “)).map(list=>( list(3).substring(list(3).lastIndexOf(“/”)+1,list(3).lastIndexOf(“/”)+8),list(5)))
val filter = logLine.filter(y=>y._2.equals(“\”GET”))

val group = filter.groupBy(line=>line._1)
val result = group.map(g =>(g._1,g._2.toList.size))
result.foreach(x=>println(x))

}
}

 

 

以上是关于统计web日志里面一个时间段的get请求数量的主要内容,如果未能解决你的问题,请参考以下文章

统计 Heroku-Streamlit Web 应用的请求数

统计web 访问日志的请求数据

Python统计web应用的每个连接使用情况

日志查找排序统计

Apache Web 服务器日志文件中的可疑请求

java web里面的get请求方式有头(head)吗?