scala实战之spark用户在线时长和登录次数统计实例

Posted zfszhangyuan

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了scala实战之spark用户在线时长和登录次数统计实例相关的知识,希望对你有一定的参考价值。

接触spark后就开始学习scala语言了,因为有一点python和java的基础学习起来还行,今天在这里把我工作中应用scala编程统计分析用户行为日志的实例和大家分析一下,我这里主要讲一下用户的在线时长统计和登录次数统计算法实现过程。

第一步 编程环境:首先你得有spark安装包 你可以先不用本地安装spark,但是可以通过import spark-assembly-1.6.2-hadoop2.6.0.jar包来完成程序调试 另外需要scala的运行环境,我用的版本:scala-sdk-2.10.6

第二步 就是处理的原材料 系统日志 我这里贴出部分我处理的日志吧

2016-04-18 16:00:00 "areacode":"浙江省丽水市","countAll":0,"countCorrect":0,"datatime":"4134362","logid":"201604181600001184409476","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966390499\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"13989589062\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"13989589062\\"","requestip":"36.16.128.234","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"宁夏银川市","countAll":0,"countCorrect":0,"datatime":"4715990","logid":"201604181600001858043208","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966400120\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"1210\\",\\"imei\\":\\"A0000044ABFD25\\",\\"subjectNum\\":\\"15379681917\\",\\"imsi\\":\\"460036951451601\\",\\"queryNum\\":\\"\\"","requestip":"115.168.93.87","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果","userAgent":"ZTE-Me/Mobile"
2016-04-18 16:00:00 "areacode":"黑龙江省哈尔滨市","countAll":0,"countCorrect":0,"datatime":"5369561","logid":"201604181600001068429609","requestinfo":"\\"interfaceUserName\\":\\"12345678900987654321\\",\\"queryNum\\":\\"\\",\\"timestamp\\":\\"1460966400139\\",\\"sign\\":\\"4\\",\\"imsi\\":\\"460030301212545\\",\\"imei\\":\\"35460207765269\\",\\"subjectNum\\":\\"55588237\\",\\"subjectPro\\":\\"123456\\",\\"remark\\":\\"4\\",\\"channelno\\":\\"2100\\"","requestip":"42.184.41.180","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"浙江省丽水市","countAll":0,"countCorrect":0,"datatime":"4003096","logid":"201604181600001648238807","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966391025\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"13989589062\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"13989589062\\"","requestip":"36.16.128.234","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"广西南宁市","countAll":0,"countCorrect":0,"datatime":"4047993","logid":"201604181600001570024205","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966382871\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"1006\\",\\"imei\\":\\"A000004853168C\\",\\"subjectNum\\":\\"07765232589\\",\\"imsi\\":\\"460031210400007\\",\\"queryNum\\":\\"13317810717\\"","requestip":"219.159.72.3","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"海南省五指山市","countAll":0,"countCorrect":0,"datatime":"5164117","logid":"201604181600001227842048","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966399159\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"1017\\",\\"imei\\":\\"A000005543AFB7\\",\\"subjectNum\\":\\"089836329061\\",\\"imsi\\":\\"460036380954376\\",\\"queryNum\\":\\"13389875751\\"","requestip":"140.240.171.71","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"山西省","countAll":0,"countCorrect":0,"datatime":"14075772","logid":"201604181600001284030648","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966400332\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"1006\\",\\"imei\\":\\"A000004FE0218A\\",\\"subjectNum\\":\\"03514043633\\",\\"imsi\\":\\"460037471517070\\",\\"queryNum\\":\\"\\"","requestip":"1.68.5.227","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"四川省","countAll":0,"countCorrect":0,"datatime":"6270982","logid":"201604181600001173504863","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966398896\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"13666231300\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"13666231300\\"","requestip":"182.144.66.97","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"浙江省","countAll":0,"countCorrect":0,"datatime":"4198522","logid":"201604181600001390637240","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966399464\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"05533876327\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"05533876327\\"","requestip":"36.23.9.49","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"000000","responsedata":"操作成功"
2016-04-18 16:00:00 "areacode":"江苏省连云港市","countAll":0,"countCorrect":0,"datatime":"4408097","logid":"201604181600001249944032","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966395908\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"18361451463\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"18361451463\\"","requestip":"58.223.4.210","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"浙江省","countAll":0,"countCorrect":0,"datatime":"5154518","logid":"201604181600001714496463","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966399474\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"05533876327\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"05533876327\\"","requestip":"36.23.9.49","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"000000","responsedata":"操作成功"
2016-04-18 16:00:00 "areacode":"浙江省","countAll":0,"countCorrect":0,"datatime":"4761269","logid":"201604181600001187577136","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966400191\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"057427895481\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"057427895481\\"","requestip":"36.23.153.219","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"河北省廊坊市","countAll":0,"countCorrect":0,"datatime":"75408665","logid":"201604181600001020722122","requestinfo":"\\"subjectNum\\":\\"13582968216\\",\\"imsi\\":\\"460031298611058\\",\\"queryNum\\":\\"18033684000\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"99000586096233\\"","requestip":"110.251.61.62","requesttime":"2016-04-18 16:00:00","requesttype":"28","responsecode":"010005","responsedata":"查询结果为空"
2016-04-18 16:00:00 "areacode":"贵州省黔西南州兴义市","countAll":0,"countCorrect":0,"datatime":"4586950","logid":"201604181600001499837763","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966398600\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"1006\\",\\"imei\\":\\"865707029710377\\",\\"subjectNum\\":\\"509\\",\\"imsi\\":\\"460025864693571\\",\\"queryNum\\":\\"\\"","requestip":"111.85.45.172","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"云南省昆明市","countAll":0,"countCorrect":0,"datatime":"4441961","logid":"201604181600001794147521","requestinfo":"\\"interfaceUserName\\":\\"12345678900987654321\\",\\"queryNum\\":\\"13618922555\\",\\"timestamp\\":\\"1460966401214\\",\\"sign\\":\\"4\\",\\"imsi\\":\\"12345678900987654321\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"13618922555\\",\\"subjectPro\\":\\"123456\\",\\"remark\\":\\"4\\",\\"channelno\\":\\"100\\"","requestip":"113.63.132.128","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"江苏省连云港市","countAll":0,"countCorrect":0,"datatime":"4186305","logid":"201604181600001175993827","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966397309\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"18361451463\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"18361451463\\"","requestip":"58.223.4.210","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"江苏省","countAll":0,"countCorrect":0,"datatime":"4103662","logid":"201604181600001051944754","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966399642\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"1006\\",\\"imei\\":\\"a0000059788b71\\",\\"subjectNum\\":\\"768\\",\\"imsi\\":\\"460036660539168\\",\\"queryNum\\":\\"\\"","requestip":"180.98.180.95","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"山西省","countAll":0,"countCorrect":0,"datatime":"4247256","logid":"201604181600001013319164","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966400334\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"1006\\",\\"imei\\":\\"A000004FE0218A\\",\\"subjectNum\\":\\"03514043633\\",\\"imsi\\":\\"460037471517070\\",\\"queryNum\\":\\"\\"","requestip":"1.68.5.227","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"北京市","countAll":0,"countCorrect":0,"datatime":"5401532","logid":"201604181600001469644300","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966399603\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"4001004259\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"\\"","requestip":"106.121.0.143","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"北京市","countAll":0,"countCorrect":0,"datatime":"4876709","logid":"201604181600001476349766","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966399603\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"4001004259\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"\\"","requestip":"106.121.0.143","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"江苏省连云港市","countAll":0,"countCorrect":0,"datatime":"4498474","logid":"201604181600001508125886","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966397987\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"18361451463\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"18361451463\\"","requestip":"58.223.4.210","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"江苏省连云港市","countAll":0,"countCorrect":0,"datatime":"4318254","logid":"201604181600001766447939","requestinfo":"\\"subjectNum\\":\\"66699\\",\\"imsi\\":\\"460036611592505\\",\\"queryNum\\":\\"\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"A00000457ECC28\\"","requestip":"58.223.4.210","requesttime":"2016-04-18 16:00:00","requesttype":"28","responsecode":"000000","responsedata":"操作成功"
2016-04-18 16:00:00 "areacode":"江西省南昌市","countAll":0,"countCorrect":0,"datatime":"244260927","logid":"201604181559591112708085","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966400525\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"1006\\",\\"imei\\":\\"a000004f883c2e\\",\\"subjectNum\\":\\"813161\\",\\"imsi\\":\\"460031392055476\\",\\"queryNum\\":\\"\\"","requestip":"182.97.149.145","requesttime":"2016-04-18 15:59:59","requesttype":"0","responsecode":"010005","responsedata":"无查询结果","userAgent":"Dalvik/1.6.0 (Linux; U; android 4.4.2; HUAWEI P7-L09 Build/HuaweiP7-L09)"
2016-04-18 16:00:00 "areacode":"上海市黄浦区","countAll":0,"countCorrect":0,"datatime":"4657170","logid":"201604181600001303952983","requestinfo":"\\"interfaceUserName\\":\\"12345678900987654321\\",\\"queryNum\\":\\"\\",\\"timestamp\\":\\"1460966400444\\",\\"sign\\":\\"4\\",\\"imei\\":\\"a000005901fef3\\",\\"subjectNum\\":\\"4235\\",\\"subjectPro\\":\\"123456\\",\\"remark\\":\\"4\\",\\"channelno\\":\\"9000\\"","requestip":"124.74.160.162","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果","userAgent":"Dalvik/2.1.0 (Linux; U; Android 6.0; HUAWEI CRR-CL00 Build/HUAWEICRR-CL00)"
2016-04-18 16:00:00 "areacode":"江西省南昌市","countAll":0,"countCorrect":0,"datatime":"252676235","logid":"201604181559591152287931","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966400399\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"1006\\",\\"imei\\":\\"a000004f883c2e\\",\\"subjectNum\\":\\"813161\\",\\"imsi\\":\\"460031392055476\\",\\"queryNum\\":\\"\\"","requestip":"182.97.149.145","requesttime":"2016-04-18 15:59:59","requesttype":"0","responsecode":"010005","responsedata":"无查询结果","userAgent":"Dalvik/1.6.0 (Linux; U; Android 4.4.2; HUAWEI P7-L09 Build/HuaweiP7-L09)"
2016-04-18 16:00:00 "areacode":"局域网","countAll":0,"countCorrect":0,"datatime":"5160006","logid":"201604181600001026793341","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966399352\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"1002\\",\\"imei\\":\\"A00000457ECC28\\",\\"subjectNum\\":\\"66699\\",\\"imsi\\":\\"460036611592505\\",\\"queryNum\\":\\"\\"","requestip":"10.55.80.187","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"江苏省","countAll":0,"countCorrect":0,"datatime":"245262271","logid":"201604181559591753547387","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966399846\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"1006\\",\\"imei\\":\\"A000004F661365\\",\\"subjectNum\\":\\"2336\\",\\"imsi\\":\\"460036580978572\\",\\"queryNum\\":\\"\\"","requestip":"180.98.187.27","requesttime":"2016-04-18 15:59:59","requesttype":"0","responsecode":"010005","responsedata":"无查询结果","userAgent":"Dalvik/1.6.0 (Linux; U; Android 4.4.2; HUAWEI C199 Build/HuaweiC199)"
2016-04-18 16:00:00 "countAll":0,"countCorrect":0,"logid":"201604181600001605286233","requestip":"36.23.153.219","requesttime":"2016-04-18 16:00:00","requesttype":"0"
2016-04-18 16:00:00 "areacode":"浙江省","countAll":0,"countCorrect":0,"datatime":"4203930","logid":"201604181600001873855360","requestinfo":"\\"sign\\":\\"4\\",\\"timestamp\\":\\"1460966400191\\",\\"remark\\":\\"4\\",\\"subjectPro\\":\\"123456\\",\\"interfaceUserName\\":\\"12345678900987654321\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"12345678900987654321\\",\\"subjectNum\\":\\"057427895481\\",\\"imsi\\":\\"12345678900987654321\\",\\"queryNum\\":\\"057427895481\\"","requestip":"36.23.153.219","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"
2016-04-18 16:00:00 "areacode":"河南省郑州市","countAll":0,"countCorrect":0,"datatime":"338870020","logid":"201604181559591841947051","requestinfo":"\\"subjectNum\\":\\"621418\\",\\"imsi\\":\\"460037561702775\\",\\"queryNum\\":\\"\\",\\"channelno\\":\\"100\\",\\"imei\\":\\"a0000055dc82e3\\"","requestip":"106.33.148.44","requesttime":"2016-04-18 15:59:59","requesttype":"28","responsecode":"000000","responsedata":"操作成功","userAgent":"Dalvik/1.6.0 (Linux; U; Android 4.4.2; PE-CL00 Build/HuaweiPE-CL00)"

第三 说一下算法实现的原理:首先我们定了一个原则 就是用户如果将我们的app退到后台10分钟 或者 10分钟没有其他操作 视用户已经退出,如果10分钟后再次发现用户操作日志记录我们将其视第二次登录app。在这个原则的基础上,我们算法实现 首先是 将用户的行为日志load到RDD中,在load的过程中,对每行记录进行过滤去掉type不是我们想要的,imei不合法的或者nuknown的,拿到log中含有imei和logid(这里指的就是用户的操作时间)的记录。得到RDD后我们会得到两组数据 第一列是imei 第二列是用户的操作时间,我们首先按imei号groupbykey()然后对每个key的logid进行list排序。后续我们就按照上面的原则来求取每个用户的登录详情和当天总的登录次数和在线时长情况。

废话不多说上代码:

/**
  * Created by zhoubh on 2016/6/28.
  */
import java.text.SimpleDateFormat

import org.apache.spark.rdd.RDD
import org.apache.spark.SparkConf, SparkContext

import scala.util.matching.Regex

/**
  * 用户在线时长和登录次数统计
  */
object UserOnlineAnalysis 
  def main(args: Array[String]) 
    if (args.length != 2) 
      System.err.println("Usage: UserOnlineAnalysis <input> <output>")
      System.exit(1)
    

    val conf = new SparkConf().setAppName("UserOnlineAnalysis").setMaster("local[4]")
    val sc = new SparkContext(conf)
    //args(0)输入文件路径
    val data = sc.textFile(args(0))
    //剔除type等于3的数据 imei为Unknown 为"" 为"000000000000000"的数据
    val notContainsType3 = data.filter(!_.contains("\\\\\\"type\\\\\\":\\\\\\"3\\\\\\"")).filter(!_.contains("\\\\\\"imei\\\\\\":\\\\\\"\\\\\\"")).filter(!_.contains("000000000000000")).filter(!_.contains("Unknown"))
    //过滤logid或imei不存在的数据 \\"imei\\":\\"\\"
    val cleanData = notContainsType3.filter(_.contains("logid")).filter(_.contains("imei"))

    val cleanMap = cleanData.map 
      line =>
          val data = formatLine(line).split(",")
        (data(0), data(1))
    
    //RDD的数据安装IMEI号分组并且按照imei号排序,输出时每行分组的第二个元素列表按照时间排序sortByKey().
    val rdd = cleanMap.groupByKey().map(x => (x._1, x._2.toList.sorted))

    rdd.cache()

    //导出明细
    exportDetailData(rdd, args(1) + "/detail")

    //导出统计
    exportSumData(rdd, args(1) + "/sum")


    rdd.unpersist()

    sc.stop()

  

  /**
    * 导出用户在线时长和登录次数统计结果
    * 存储结构:(IMEI,登录次数,在线时长(秒))
    *
    **/
  def exportSumData(map: RDD[(String, List[String])], output: String): Unit = 
    val result = map.map 
      x =>
        //登录次数,默认登录1次
        var logNum: Int = 1
        //在线时长(秒)
        var totalTime: Long = 0

        val len = x._2.length

        for (i <- 0 until len) 
          if (i + 1 < len) 
            val nowTime = getTimeByString(x._2(i))
            val nextTime = getTimeByString(x._2(i + 1))
            val intervalTime = nextTime - nowTime
            if (intervalTime < 60 * 10) 
              totalTime += intervalTime
             else 
              logNum += 1
            
          

        
        //输出ime,登录次数,总时长(秒)
        (x._1, logNum, totalTime)
    

    result.saveAsTextFile(output)
  

  /**
    * 导出用户在线时长和首次登录时间
    * 存储结构:(IMEI,首次登录时间,在线时长(秒))
    *
    **/
  def exportDetailData(map: RDD[(String, List[String])], output: String): Unit = 
    val result = map.flatMap 
      x =>
        val len = x._2.length
        val array = new Array[(String, String, Long)](len)
        for (i <- 0 until len) 
          if (i + 1 < len) 
            val nowTime = getTimeByString(x._2(i))
            val nextTime = getTimeByString(x._2(i + 1))
            val intervalTime = nextTime - nowTime
            if (intervalTime < 60 * 10) 
              array(i) = (x._1, x._2(i), intervalTime)
             else 
              array(i) = (x._1, x._2(i), 0)
            
           else 
            array(i) = (x._1, x._2(i), 0)
          

        
        array
    
    result.saveAsTextFile(output)
  

  /**
    * 从每行日志解析出imei和logid
    *
    **/
  def formatLine(line: String): String = 
      val logIdRegex = """"logid":"([0-9]+)",""".r
    val imeiRegex = """\\\\"imei\\\\":\\\\"([A-Za-z0-9]+)\\\\"""".r
    val logId = getDataByPattern(logIdRegex, line)
    val imei = getDataByPattern(imeiRegex, line)

    //时间取到秒
    imei + "," + logId.substring(0, 14)
  
  /**
    * 根据正则表达式,查找相应值
    *
    **/
  def getDataByPattern(p: Regex, line: String): String = 
    val result = (p.findFirstMatchIn(line)).map(item => 
      val s = item group 1 //返回匹配上正则的第一个字符串。
      s
    )
    result.getOrElse("NULL")
  
  /**
    * 根据时间字符串获取时间秒数,单位(秒) 时间戳是指格林威治时间1970年01月01日00时00分00秒(北京时间1970年01月01日08时00分00秒)起至现在的总毫秒数
    * 所以返回时间戳/1000
    **/
  def getTimeByString(timeString: String): Long = 
    val sf: SimpleDateFormat = new SimpleDateFormat("yyyyMMddHHmmss")
    sf.parse(timeString).getTime / 1000
  

我本机是mac pro所以配置文件有点不一样


如果你是win7配置文件可能要改下:


直接看输出结果吧:


detail:


sum:


以上是关于scala实战之spark用户在线时长和登录次数统计实例的主要内容,如果未能解决你的问题,请参考以下文章

spark入门知识讲解和基础数据操作编程(统一用scala编程实例)

大数据实战之spark安装部署

日志分析实战之清洗日志小实例1:使用spark&Scala分析Apache日志

scala实战之spark读取mysql数据表并存放到mysql库中编程实例

HikariCP 源码分析之 leakDetectionThreshold 及实战解决 Spark/Scala 连接池泄漏

[Spark/Scala] 180414|大数据实战培训 Spark大型项目实战:电商用户行为分析大数据平台 大数据视频教程