Spark-Core(要求取到每个班级的成绩排行的的前三名)
Posted Mr.zhou_Zxy
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark-Core(要求取到每个班级的成绩排行的的前三名)相关的知识,希望对你有一定的参考价值。
- 需求:
一组数据,有id,name,score,myclass等四个字段,要求取到每个班级的成绩排行的的前三名
- 数据
ID name score stuclass
1 z1 90 1
2 z2 100 1
3 z3 70 2
4 z4 30 1
5 z5 200 2
6 z6 120 1
7 z7 90 1
8 z8 100 2
9 z9 80 1
10 z10 90 2
11 z11 10 1
- 理想结果
(2,List((200,5,z5), (100,8,z8), (90,10,z10)))
(1,List((120,6,z6), (100,2,z2), (90,1,z1)))
- 第一次代码
package com.zxy.spark.core.day06
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object Demo5 {
def main(args: Array[String]): Unit = {
val sc = new SparkContext(new SparkConf().setAppName("demo5").setMaster("local[*]"))
val stuRDD: RDD[String] = sc.textFile("date/student.txt")
val mapRDD: RDD[(String,( String, String, String))] = stuRDD.map(line => {
val info: Array[String] = line.split("\\\\s+")
val id = info(0)
val name = info(1)
val score = info(2)
val stuclass = info(3)
(stuclass, (score, id, name))
})
val gbkRDD: RDD[(String, Iterable[(String, String, String)])] = mapRDD.groupByKey()
val value: RDD[(String, List[(String, String, String)])] = gbkRDD.mapValues(values => {
values.toList.sortWith(_._1 > _._1).take(3)
})
value.foreach(println)
sc.stop()
}
}
- 运行结果
(2,List((90,10,z10), (70,3,z3), (200,5,z5)))
(1,List((90,1,z1), (90,7,z7), (80,9,z9)))
- 分析
结果明显与预期效果不符合,原因是需要进行比较的score字段是String,没有转成Int型进行比大小
- 修改后代码
package com.zxy.spark.core.day06
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object Demo5 {
def main(args: Array[String]): Unit = {
val sc = new SparkContext(new SparkConf().setAppName("demo5").setMaster("local[*]"))
val stuRDD: RDD[String] = sc.textFile("date/student.txt")
val mapRDD: RDD[(String,( Int, String, String))] = stuRDD.map(line => {
val info: Array[String] = line.split("\\\\s+")
val id = info(0)
val name = info(1)
val score = info(2).toInt
val stuclass = info(3)
(stuclass, (score, id, name))
})
val gbkRDD: RDD[(String, Iterable[(Int, String, String)])] = mapRDD.groupByKey()
val value: RDD[(String, List[(Int, String, String)])] = gbkRDD.mapValues(values => {
values.toList.sortWith(_._1 > _._1).take(3)
})
value.foreach(println)
sc.stop()
}
}
(2,List((200,5,z5), (100,8,z8), (90,10,z10)))
(1,List((120,6,z6), (100,2,z2), (90,1,z1)))
以上是关于Spark-Core(要求取到每个班级的成绩排行的的前三名)的主要内容,如果未能解决你的问题,请参考以下文章