scala可以使用sparksql查询吗
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了scala可以使用sparksql查询吗相关的知识,希望对你有一定的参考价值。
参考技术A 试试看看spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\SQLParser.scalascala语言不是很容易懂,但是里面有解析SQL的方法,可以看出支持的SQL语句,至少关键词是很明确的。
protected val ALL = Keyword("ALL")
protected val AND = Keyword("AND")
protected val APPROXIMATE = Keyword("APPROXIMATE")
protected val AS = Keyword("AS")
protected val ASC = Keyword("ASC")
protected val BETWEEN = Keyword("BETWEEN")
protected val BY = Keyword("BY")
protected val CASE = Keyword("CASE")
protected val CAST = Keyword("CAST")
protected val DESC = Keyword("DESC")
protected val DISTINCT = Keyword("DISTINCT")
protected val ELSE = Keyword("ELSE")
protected val END = Keyword("END")
protected val EXCEPT = Keyword("EXCEPT")
protected val FALSE = Keyword("FALSE")
protected val FROM = Keyword("FROM")
protected val FULL = Keyword("FULL")
protected val GROUP = Keyword("GROUP")
protected val HAVING = Keyword("HAVING")
protected val IN = Keyword("IN")
protected val INNER = Keyword("INNER")
protected val INSERT = Keyword("INSERT")
protected val INTERSECT = Keyword("INTERSECT")
protected val INTO = Keyword("INTO")
protected val IS = Keyword("IS")
protected val JOIN = Keyword("JOIN")
protected val LEFT = Keyword("LEFT")
protected val LIKE = Keyword("LIKE")
protected val LIMIT = Keyword("LIMIT")
protected val NOT = Keyword("NOT")
protected val NULL = Keyword("NULL")
protected val ON = Keyword("ON")
protected val OR = Keyword("OR")
protected val ORDER = Keyword("ORDER")
protected val SORT = Keyword("SORT")
protected val OUTER = Keyword("OUTER")
protected val OVERWRITE = Keyword("OVERWRITE")
protected val REGEXP = Keyword("REGEXP")
protected val RIGHT = Keyword("RIGHT")
protected val RLIKE = Keyword("RLIKE")
protected val SELECT = Keyword("SELECT")
protected val SEMI = Keyword("SEMI")
protected val TABLE = Keyword("TABLE")
protected val THEN = Keyword("THEN")
protected val TRUE = Keyword("TRUE")
protected val UNION = Keyword("UNION")
protected val WHEN = Keyword("WHEN")
protected val WHERE = Keyword("WHERE")
protected val WITH = Keyword("WITH")
sparkSql使用hive数据源
1.pom文件
<dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.4</version> <scope>test</scope> </dependency> <dependency> <groupId>org.specs</groupId> <artifactId>specs</artifactId> <version>1.2.5</version> <scope>test</scope> </dependency> <!-- https://mvnrepository.com/artifact/oracle/ojdbc6 --> <dependency> <groupId>com.oracle</groupId> <artifactId>ojdbc6</artifactId> <version>11.2.0.3</version> </dependency> <!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java --> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>${mysql.version}</version> </dependency> <!-- https://mvnrepository.com/artifact/com.alibaba/druid --> <dependency> <groupId>com.alibaba</groupId> <artifactId>druid</artifactId> <version>${druid.version}</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>${spark.verson}</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.11</artifactId> <version>${spark.verson}</version> <scope>provided</scope> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>${spark.verson}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.11</artifactId> <version>${spark.verson}</version> </dependency>
2.代码
import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.hive.HiveContext object HiveDataSource extends App { val config = new SparkConf().setAppName("HiveDataSource").setMaster("local") val sc = new SparkContext(config) val sqlContext = new HiveContext(sc) sqlContext.sql("drop table if exists default.student_infos") sqlContext.sql("create table if not exists default.student_infos (name string,age int) row format delimited fields terminated by \',\' stored as textfile") sqlContext.sql("load data inpath \'/tmp/student_infos.txt\' into table default.student_infos") // 用同样的方式,给student_scores导入数据 sqlContext.sql("DROP TABLE IF EXISTS default.student_scores") sqlContext.sql("create table if not exists default.student_scores (name string,score int) row format delimited fields terminated by \',\' stored as textfile") sqlContext.sql("load data inpath \'/tmp/student_scores.txt\' into table default.student_scores") // 关联两张表执行查询,查询成绩大于80分的学生 val goodStudentDf = sqlContext.sql("select t1.name,t1.age,t2.score from default.student_infos t1 join default.student_scores t2 on t1.name = t2.name") goodStudentDf.show() }
3.拷贝hive/config下的hive-site.xml到src/main/resources中
4.编译打包
5.jar包放到服务器上
6.添加脚本:
/home/hadoop/app/spark/bin/spark-submit \\
--class com.dsj361.HiveDataSource \\
--master local[*] \\
--num-executors 2 \\
--driver-memory 1000m \\
--executor-memory 1000m \\
--executor-cores 2 \\
/home/hadoop/sparksqlapp/jar/sparkSqlStudy.jar
7.运行即可
比hive快很多
<wiz_tmp_tag id="wiz-table-range-border" contenteditable="false" style="display: none;">
附件列表
以上是关于scala可以使用sparksql查询吗的主要内容,如果未能解决你的问题,请参考以下文章
Apache Livy:通过 REST 查询 Spark SQL:可能吗?
Spark SQL 用于从两个不同的查询中划分计数并将输出存储为 Double