九十Spark-SparkSQL(查询sql)
Posted 托马斯-酷涛
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了九十Spark-SparkSQL(查询sql)相关的知识,希望对你有一定的参考价值。
textFile文件读取
读取数据展示
代码
package org.example.SQL
import org.apache.log4j.Level, Logger
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.DataFrame, SparkSession
object Test4 //查询sql
def main(args: Array[String]): Unit =
Logger.getLogger("org").setLevel(Level.ERROR) //不打印日志
val spark: SparkSession = SparkSession.builder().appName("test4").master("local").getOrCreate()
val sc: SparkContext = spark.sparkContext
val lines = sc.textFile("data/input/person.txt")
val rdd: RDD[person] = lines.map line =>
val arr: Array[String] = line.split(" ")
person(arr(0).toInt, arr(1), arr(2).toInt)
import spark.implicits._
val personDF: DataFrame = rdd.toDF() //转换为DataFrame
personDF.printSchema()
personDF.show()
//--------------------SQL----------------------
//注册表名
personDF.createOrReplaceTempView("student")
//查看name字段
spark.sql("select name from student").show()
//查看name和age字段
spark.sql("select name,age from student").show()
//查看所有age和name字段,并将age+1
spark.sql("select name,age,age+1 from student").show()
//过滤age>=25的
spark.sql("select name,age from student where age<25").show()
//统计年龄大于35的人数
spark.sql("select count(*) from student where age>35").show()
//按年龄进行分组并统计相同年龄的人数
spark.sql("select age,count(*) from student group by age").show()
//查询姓名等于张三的
spark.sql("select name from student where name = 'zhangsan' ").show()
case class person(id: Int, name: String, age: Int)
约束
数据表
数据过滤
+--------+
| name|
+--------+
|zhangsan|
| lisi|
| wangwu|
| zhaoliu|
| tianqi|
| kobe|
+--------+
+--------+---+
| name|age|
+--------+---+
|zhangsan| 20|
| lisi| 29|
| wangwu| 25|
| zhaoliu| 30|
| tianqi| 35|
| kobe| 40|
+--------+---+
+--------+---+---------+
| name|age|(age + 1)|
+--------+---+---------+
|zhangsan| 20| 21|
| lisi| 29| 30|
| wangwu| 25| 26|
| zhaoliu| 30| 31|
| tianqi| 35| 36|
| kobe| 40| 41|
+--------+---+---------+
+--------+---+
| name|age|
+--------+---+
|zhangsan| 20|
+--------+---+
+--------+
|count(1)|
+--------+
| 1|
+--------+
+---+--------+
|age|count(1)|
+---+--------+
| 20| 1|
| 40| 1|
| 35| 1|
| 25| 1|
| 29| 1|
| 30| 1|
+---+--------+
+--------+
| name|
+--------+
|zhangsan|
+--------+
以上是关于九十Spark-SparkSQL(查询sql)的主要内容,如果未能解决你的问题,请参考以下文章
客快物流大数据项目(九十七):ClickHouse的SQL语法