Spark记录-Spark-Shell客户端操作读取Hive数据
Posted 信方互联网硬汉
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark记录-Spark-Shell客户端操作读取Hive数据相关的知识,希望对你有一定的参考价值。
1.拷贝hive-site.xml到spark/conf下,拷贝mysql-connector-java-xxx-bin.jar到hive/lib下
2.开启hive元数据服务:hive --service metastore
3.开启hadoop服务:sh $HADOOP_HOME/sbin/start-all.sh
4.开启spark服务:sh $SPARK_HOME/sbin/start-all.sh
5.进入spark-shell:spark-shell
6.scala操作hive(spark-sql)
scala>val conf=new SparkConf().setAppName("SparkHive").setMaster("local") //可忽略,已经自动创建了
scala>val sc=new SparkContext(conf) //可忽略,已经自动创建了
scala>val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala>sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t‘ ")//这里需要注意数据的间隔符
scala>sqlContext.sql("LOAD DATA INPATH ‘/user/spark/src.txt‘ INTO TABLE src ");
scala>sqlContext.sql(" SELECT * FROM src").collect().foreach(println)
scala>sc.stop()
SQL context available as sqlContext. scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) 17/12/05 10:38:51 INFO HiveContext: Initializing execution hive, version 1.2.1 17/12/05 10:38:51 INFO ClientWrapper: Inspected Hadoop version: 2.4.0 17/12/05 10:38:51 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.4.0 17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.metastore.local does not exist 17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist 17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist 17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist 17/12/05 10:38:51 INFO metastore: Mestastore configuration hive.metastore.warehouse.dir changed from file:/tmp/spark-ecfcdcc1-2bb0-4efc-aa00-96ad1dd47840/metastore to file:/tmp/spark-ea48b58b-ef90-43c0-8d5e-f54a4b4cadde/metastore 17/12/05 10:38:51 INFO metastore: Mestastore configuration javax.jdo.option.ConnectionURL changed from jdbc:derby:;databaseName=/tmp/spark-ecfcdcc1-2bb0-4efc-aa00-96ad1dd47840/metastore;create=true to jdbc:derby:;databaseName=/tmp/spark-ea48b58b-ef90-43c0-8d5e-f54a4b4cadde/metastore;create=true 17/12/05 10:38:51 INFO HiveMetaStore: 0: Shutting down the object store... 17/12/05 10:38:51 INFO audit: ugi=root ip=unknown-ip-addr cmd=Shutting down the object store... 17/12/05 10:38:51 INFO HiveMetaStore: 0: Metastore shutdown complete. 17/12/05 10:38:51 INFO audit: ugi=root ip=unknown-ip-addr cmd=Metastore shutdown complete. 17/12/05 10:38:51 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 17/12/05 10:38:51 INFO ObjectStore: ObjectStore, initialize called 17/12/05 10:38:51 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 17/12/05 10:38:51 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored 17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.metastore.local does not exist 17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist 17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist 17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist 17/12/05 10:38:56 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 17/12/05 10:38:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 17/12/05 10:38:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 17/12/05 10:39:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 17/12/05 10:39:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 17/12/05 10:39:01 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 17/12/05 10:39:01 INFO ObjectStore: Initialized ObjectStore 17/12/05 10:39:01 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 17/12/05 10:39:02 INFO HiveMetaStore: Added admin role in metastore 17/12/05 10:39:02 INFO HiveMetaStore: Added public role in metastore 17/12/05 10:39:02 INFO HiveMetaStore: No user is added in admin role, since config is empty 17/12/05 10:39:02 INFO SessionState: Created local directory: /tmp/d66a519b-e512-4295-b707-0f688aa238ea_resources 17/12/05 10:39:02 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/d66a519b-e512-4295-b707-0f688aa238ea 17/12/05 10:39:02 INFO SessionState: Created local directory: /tmp/root/d66a519b-e512-4295-b707-0f688aa238ea 17/12/05 10:39:02 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/d66a519b-e512-4295-b707-0f688aa238ea/_tmp_space.db 17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.metastore.local does not exist 17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist 17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist 17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist 17/12/05 10:39:02 INFO HiveContext: default warehouse location is /user/hive/warehouse 17/12/05 10:39:02 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 17/12/05 10:39:02 INFO ClientWrapper: Inspected Hadoop version: 2.4.0 17/12/05 10:39:03 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.4.0 17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.metastore.local does not exist 17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist 17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist 17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist 17/12/05 10:39:08 INFO metastore: Trying to connect to metastore with URI thrift://192.168.66.66:9083 17/12/05 10:39:08 INFO metastore: Connected to metastore. 17/12/05 10:39:10 INFO SessionState: Created local directory: /tmp/4989df94-ba31-4ef6-ab78-369043e2067e_resources 17/12/05 10:39:10 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e 17/12/05 10:39:10 INFO SessionState: Created local directory: /tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e 17/12/05 10:39:10 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e/_tmp_space.db sqlContext: org.apache.spark.sql.hive.HiveContext = [email protected] scala> sqlContext.sql("use siat") 17/12/05 10:39:36 INFO ParseDriver: Parsing command: use siat 17/12/05 10:39:41 INFO ParseDriver: Parse Completed 17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:45 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:45 INFO ParseDriver: Parsing command: use siat 17/12/05 10:39:49 INFO ParseDriver: Parse Completed 17/12/05 10:39:50 INFO PerfLogger: </PERFLOG method=parse start=1512441585044 end=1512441590042 duration=4998 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:50 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:51 INFO Driver: Semantic Analysis Completed 17/12/05 10:39:51 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441590188 end=1512441591560 duration=1372 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:51 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 17/12/05 10:39:51 INFO PerfLogger: </PERFLOG method=compile start=1512441584491 end=1512441591758 duration=7267 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:51 INFO Driver: Concurrency mode is disabled, not creating a lock manager 17/12/05 10:39:51 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:51 INFO Driver: Starting command(queryId=root_20171205103945_2f994f07-9e52-456b-97ee-d03e722116ff): use siat 17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441584488 end=1512441592212 duration=7724 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:52 INFO Driver: Starting task [Stage-0:DDL] in serial mode 17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=runTasks start=1512441592212 end=1512441592496 duration=284 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441591760 end=1512441592497 duration=737 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:52 INFO Driver: OK 17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441592571 end=1512441592571 duration=0 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441584478 end=1512441592571 duration=8093 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441592612 end=1512441592613 duration=1 from=org.apache.hadoop.hive.ql.Driver> res0: org.apache.spark.sql.DataFrame = [result: string] scala> sqlContext.sql("drop table src") 17/12/05 10:40:13 INFO ParseDriver: Parsing command: drop table src 17/12/05 10:40:13 INFO ParseDriver: Parse Completed 17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:17 INFO ParseDriver: Parsing command: DROP TABLE src 17/12/05 10:40:17 INFO ParseDriver: Parse Completed 17/12/05 10:40:17 INFO PerfLogger: </PERFLOG method=parse start=1512441617979 end=1512441617998 duration=19 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:19 INFO Driver: Semantic Analysis Completed 17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441617999 end=1512441619115 duration=1116 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:19 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=compile start=1512441617977 end=1512441619116 duration=1139 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:19 INFO Hive: Dumping metastore api call timing information for : compilation phase 17/12/05 10:40:19 INFO Hive: Total time spent in this metastore function was greater than 1000ms : getTable_(String, String, )=3999 17/12/05 10:40:19 INFO Driver: Concurrency mode is disabled, not creating a lock manager 17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:19 INFO Driver: Starting command(queryId=root_20171205104017_dd3db388-5058-4af4-9076-90035b4837d9): DROP TABLE src 17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441617977 end=1512441619119 duration=1142 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:40:19 INFO Driver: Starting task [Stage-0:DDL] in serial mode 17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=runTasks start=1512441619119 end=1512441664030 duration=44911 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:04 INFO Hive: Dumping metastore api call timing information for : execution phase 17/12/05 10:41:04 INFO Hive: Total time spent in this metastore function was greater than 1000ms : dropTable_(String, String, boolean, boolean, boolean, )=44266 17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441619118 end=1512441664031 duration=44913 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:04 INFO Driver: OK 17/12/05 10:41:04 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441664032 end=1512441664032 duration=0 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441617976 end=1512441664051 duration=46075 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:04 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441664054 end=1512441664054 duration=0 from=org.apache.hadoop.hive.ql.Driver> res1: org.apache.spark.sql.DataFrame = [] scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t‘ ") 17/12/05 10:41:57 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ 17/12/05 10:41:57 INFO ParseDriver: Parse Completed 17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:57 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ 17/12/05 10:41:57 INFO ParseDriver: Parse Completed 17/12/05 10:41:57 INFO PerfLogger: </PERFLOG method=parse start=1512441717568 end=1512441717619 duration=51 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:58 INFO CalcitePlanner: Starting Semantic Analysis 17/12/05 10:41:58 INFO CalcitePlanner: Creating table siat.src position=27 17/12/05 10:41:58 INFO Driver: Semantic Analysis Completed 17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441717619 end=1512441718637 duration=1018 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:58 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=compile start=1512441717565 end=1512441718637 duration=1072 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:58 INFO Driver: Concurrency mode is disabled, not creating a lock manager 17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:58 INFO Driver: Starting command(queryId=root_20171205104157_e9b5ed54-e7dc-448a-984c-6d5cb37f964f): CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ 17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441717565 end=1512441718735 duration=1170 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:41:58 INFO Driver: Starting task [Stage-0:DDL] in serial mode 17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=runTasks start=1512441718735 end=1512441721846 duration=3111 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:42:01 INFO Hive: Dumping metastore api call timing information for : execution phase 17/12/05 10:42:01 INFO Hive: Total time spent in this metastore function was greater than 1000ms : createTable_(Table, )=2431 17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441718638 end=1512441721849 duration=3211 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:42:01 INFO Driver: OK 17/12/05 10:42:01 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441721852 end=1512441721882 duration=30 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441717564 end=1512441721883 duration=4319 from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:42:01 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441721883 end=1512441721883 duration=0 from=org.apache.hadoop.hive.ql.Driver> res2: org.apache.spark.sql.DataFrame = [result: string] scala> sqlContext.sql("select * from src").collect().foreach(println) 17/12/05 10:42:54 INFO ParseDriver: Parsing command: select * from src 17/12/05 10:42:54 INFO ParseDriver: Parse Completed 17/12/05 10:42:56 INFO deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/12/05 10:42:58 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 467.6 KB, free 142.8 MB) 17/12/05 10:43:02 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 40.5 KB, free 142.8 MB) 17/12/05 10:43:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.66.66:36024 (size: 40.5 KB, free: 143.2 MB) 17/12/05 10:43:02 INFO SparkContext: Created broadcast 0 from collect at <console>:30 17/12/05 10:43:04 INFO FileInputFormat: Total input paths to process : 0 17/12/05 10:43:04 INFO SparkContext: Starting job: collect at <console>:30 17/12/05 10:43:04 INFO DAGScheduler: Job 0 finished: collect at <console>:30, took 0.043396 s scala> val res=sqlContext.sql("select * from src").collect().foreach(println) 17/12/05 10:43:25 INFO ParseDriver: Parsing command: select * from src 17/12/05 10:43:25 INFO ParseDriver: Parse Completed 17/12/05 10:43:26 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 467.6 KB, free 142.3 MB) 17/12/05 10:43:27 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 40.5 KB, free 142.3 MB) 17/12/05 10:43:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.66.66:36024 (size: 40.5 KB, free: 143.2 MB) 17/12/05 10:43:27 INFO SparkContext: Created broadcast 1 from collect at <console>:29 17/12/05 10:43:27 INFO FileInputFormat: Total input paths to process : 0 17/12/05 10:43:27 INFO SparkContext: Starting job: collect at <console>:29 17/12/05 10:43:27 INFO DAGScheduler: Job 1 finished: collect at <console>:29, took 0.000062 s scala> res scala> val res=sqlContext.sql("select count(*) from src").collect().foreach(println) 17/12/05 10:43:47 INFO ParseDriver: Parsing command: select count(*) from src 17/12/05 10:43:47 INFO ParseDriver: Parse Completed 17/12/05 10:43:48 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 467.0 KB, free 141.8 MB) 17/12/05 10:43:48 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 40.4 KB, free 141.8 MB) 17/12/05 10:43:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.66.66:36024 (size: 40.4 KB, free: 143.1 MB) 17/12/05 10:43:48 INFO SparkContext: Created broadcast 2 from collect at <console>:29 17/12/05 10:43:49 INFO FileInputFormat: Total input paths to process : 0 17/12/05 10:43:49 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.66.66:36024 in memory (size: 40.5 KB, free: 143.2 MB) 17/12/05 10:43:49 INFO SparkContext: Starting job: collect at <console>:29 17/12/05 10:43:49 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.66.66:36024 in memory (size: 40.5 KB, free: 143.2 MB) 17/12/05 10:43:49 INFO DAGScheduler: Registering RDD 15 (collect at <console>:29) 17/12/05 10:43:49 INFO DAGScheduler: Got job 2 (collect at <console>:29) with 1 output partitions 17/12/05 10:43:49 INFO DAGScheduler: Final stage: ResultStage 1 (collect at <console>:29) 17/12/05 10:43:49 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 17/12/05 10:43:49 INFO DAGScheduler: Missing parents: List() 17/12/05 10:43:49 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[18] at collect at <console>:29), which has no missing parents 17/12/05 10:43:49 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 12.0 KB, free 142.7 MB) 17/12/05 10:43:49 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 6.0 KB, free 142.7 MB) 17/12/05 10:43:49 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.66.66:36024 (size: 6.0 KB, free: 143.2 MB) 17/12/05 10:43:49 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006 17/12/05 10:43:49 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[18] at collect at <console>:29) 17/12/05 10:43:49 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 17/12/05 10:44:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/12/05 10:44:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/12/05 10:44:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/12/05 10:44:50 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/12/05 10:45:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/12/05 10:45:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/12/05 10:45:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/12/05 10:45:50 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/12/05 10:45:57 INFO AppClient$ClientEndpoint: Executor added: app-20171205103712-0001/0 on worker-20171204180628-192.168.66.66-7078 (192.168.66.66:7078) with 2 cores 17/12/05 10:45:57 INFO SparkDeploySchedulerBackend: Granted executor ID app-20171205103712-0001/0 on hostPort 192.168.66.66:7078 with 2 cores, 512.0 MB RAM 17/12/05 10:45:59 INFO AppClient$ClientEndpoint: Executor updated: app-20171205103712-0001/0 is now RUNNING 17/12/05 10:46:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/12/05 10:46:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/12/05 10:46:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/12/05 10:46:46 INFO SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (xinfang:10363) with ID 0 17/12/05 10:46:47 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, xinfang, partition 0,PROCESS_LOCAL, 1999 bytes) 17/12/05 10:46:48 INFO BlockManagerMasterEndpoint: Registering block manager xinfang:34620 with 143.3 MB RAM, BlockManagerId(0, xinfang, 34620) 17/12/05 10:46:51 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on xinfang:34620 (size: 6.0 KB, free: 143.2 MB) 17/12/05 10:47:07 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to xinfang:10363 17/12/05 10:47:08 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 82 bytes 17/12/05 10:47:14 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 27243 ms on xinfang (1/1) 17/12/05 10:47:14 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 17/12/05 10:47:14 INFO DAGScheduler: ResultStage 1 (collect at <console>:29) finished in 204.228 s 17/12/05 10:47:14 INFO DAGScheduler: Job 2 finished: collect at <console>:29, took 204.785107 s [0] scala> res scala> sc.stop() 17/12/05 10:48:32 INFO SparkUI: Stopped Spark web UI at http://192.168.66.66:4041 17/12/05 10:48:35 INFO SparkDeploySchedulerBackend: Shutting down all executors 17/12/05 10:48:35 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 17/12/05 10:48:35 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/12/05 10:48:36 INFO MemoryStore: MemoryStore cleared 17/12/05 10:48:36 INFO BlockManager: BlockManager stopped 17/12/05 10:48:36 INFO BlockManagerMaster: BlockManagerMaster stopped 17/12/05 10:48:36 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/12/05 10:48:36 INFO SparkContext: Successfully stopped SparkContext scala> 17/12/05 10:48:36 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 17/12/05 10:48:36 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 17/12/05 10:48:38 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
以上是关于Spark记录-Spark-Shell客户端操作读取Hive数据的主要内容,如果未能解决你的问题,请参考以下文章
使用 Calliope 库的 Spark Cassandra 集成 - 不显示任何记录