Spark CTAS 上的 Hive 使用 Straight SELECT 失败，但使用 SELECT GROUP BY 成功

Posted 2023-04-18

技术标签:

【中文标题】Spark CTAS 上的 Hive 使用 Straight SELECT 失败，但使用 SELECT GROUP BY 成功【英文标题】：Hive on Spark CTAS Fails with Straight SELECT but succeeds with SELECT GROUP BY 【发布时间】：2016-11-12 18:12:23 【问题描述】：

我已经在 Spark 上成功使用 Hive 数周来执行包含聚合数据的 CTAS 语句。最近，我需要使用直接选择语句（没有 GROUP BY）执行 CTAS，但很惊讶它失败了。我发现我可以在 MR 上使用 Hive 执行语句就好了。我创建了一个测试来隔离我的问题。

创建源表：

hive -e "

    CREATE DATABASE IF NOT EXISTS testhiveonspark;
    DROP TABLE IF EXISTS testhiveonspark.temptable1;
    CREATE TABLE testhiveonspark.temptable1 (num1 INT, num2 INT)
    STORED AS TEXTFILE;
    INSERT into table testhiveonspark.temptable1 values (1, 1);
    select * from testhiveonspark.temptable1;

"

执行一个有效的 CTAS 文件（使用 SELECT GROUP BY）...这样就完成了：

cat > testhiveonspark_working.hql << EOF

    set mapred.job.queue.name=root.apps10;
    set spark.master=yarn-client;
    set hive.server2.enable.doAs=false;
    set hive.execution.engine=spark;
    set spark.eventLog.enabled=true;
    set spark.shuffle.blockTransferService=nio;
    set spark.eventLog.dir=hdfs://hdfsnode:8020/user/spark/applicationHistory;
    set spark.shuffle.service.enabled=true;
    set spark.dynamicAllocation.enabled=true;

    DROP TABLE IF EXISTS testhiveonspark.temptable2;
    CREATE TABLE testhiveonspark.temptable2
    STORED AS TEXTFILE
    AS SELECT num1, num2 FROM testhiveonspark.temptable1 GROUP BY num1, num2;
    SELECT * FROM testhiveonspark.temptable2;

EOF
hive -f testhiveonspark_working.hql

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 0.931 seconds
Query ID = jolsen_20161112104040_4e398c2c-4691-47c8-ab79-dfd56195bf2a
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = 3f263511-afa9-48ba-b224-1496194d5d3e
Running with YARN Application = application_1478903884683_0019
Kill Command = /usr/lib/hadoop/bin/yarn application -kill application_1478903884683_0019

Query Hive on Spark job[0] stages:
0
1

Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2016-11-12 10:41:15,453 Stage-0_0: 0/1  Stage-1_0: 0/1
2016-11-12 10:41:18,482 Stage-0_0: 0/1  Stage-1_0: 0/1
2016-11-12 10:41:21,504 Stage-0_0: 0(+1)/1      Stage-1_0: 0/1
2016-11-12 10:41:24,533 Stage-0_0: 1/1 Finished Stage-1_0: 0(+1)/1
2016-11-12 10:41:25,542 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished
Status: Finished successfully in 18.15 seconds
Moving data to: hdfs://hdfsnode:8020/user/hive/warehouse/testhiveonspark.db/temptable2
Table testhiveonspark.temptable2 stats: [numFiles=1, numRows=1, totalSize=4, rawDataSize=3]
OK
Time taken: 83.03 seconds
OK
1       1
Time taken: 0.203 seconds, Fetched: 1 row(s)

执行一个不工作的 CTAS 文件（仅使用 SELECT，没有 GROUP BY）...这没有完成：

cat > testhiveonspark_not_working.hql << EOF

    set mapred.job.queue.name=root.apps10;
    set spark.master=yarn-client;
    set hive.server2.enable.doAs=false;
    set hive.execution.engine=spark;
    set spark.eventLog.enabled=true;
    set spark.shuffle.blockTransferService=nio;
    set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory;
    set spark.shuffle.service.enabled=true;
    set spark.dynamicAllocation.enabled=true;

    DROP TABLE IF EXISTS testhiveonspark.temptable2;
    CREATE TABLE testhiveonspark.temptable2
    STORED AS TEXTFILE
    AS SELECT num1, num2 FROM testhiveonspark.temptable1;

EOF
hive -f testhiveonspark_not_working.hql

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 1.041 seconds
Query ID = jolsen_20161112111313_018a2914-6a3b-428c-af7e-81e0a99a302d
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = c2affd61-1a39-4765-b3ac-e27bc79562ab
Job hasn't been submitted after 61s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

我正在运行版本“Hive 1.1.0-cdh5.8.0”和 Spark“版本 1.6.0”，并认为这可能与我不知道的错误或默认设置有关。 cdh 5.8.2 的发行说明表明一张票似乎与我的问题 (https://issues.apache.org/jira/browse/HIVE-12650) 有关，因此决定将我的集群升级到最新版本 cdh5.9.0。不幸的是，我仍然遇到同样的问题。

我“认为”日志表明 Hive 在收到 Application Master 的响应之前已经超时，但老实说我不确定。我还在调查。我也在使用调试运行 hive，但没有看到任何帮助......日志似乎只是证实了我已经从错误中观察到的内容：“hive --hiveconf hive.root.logger=DEBUG,console -f testhiveonspark_working.hql"

失败的原因可能是什么？两个 CTAS 语句并没有太大的不同。为什么一个工作而另一个失败？

谢谢。

我在调试中运行了我的工作和不工作测试，例如“hive --hiveconf hive.root.logger=DEBUG,console -f testhiveonspark_not_working.hql”并比较了输出。从增量中，我尝试从日志中选择我认为最相关的部分，并将其粘贴在下面。我仍然在努力理解为什么“61 秒后还没有提交作业”，以及这个问题的根源。有什么帮助吗？谢谢。

 ...

6/11/12 11:42:56 [main]: INFO client.SparkClientImpl: Running client driver with argv: /usr/lib/spark/bin/spark-submit --properties-file /tmp/spark-submit.5047982490856668419.properties --class org.apache.hive.spark.client.RemoteDriver /usr/lib/hive/lib/hive-exec-1.1.0-cdh5.9.0.jar --remote-host HDFSNode --remote-port 42944 --conf hive.spark.client.connect.timeout=1000 --conf hive.spark.client.server.connect.timeout=90000 --conf hive.spark.client.channel.log.level=null --conf hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 --conf hive.spark.client.rpc.server.address=null
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Class path contains multiple SLF4J bindings.
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:57 INFO RemoteDriver: Connecting to: HDFSNode:42944
16/11/12 11:42:58 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:58 WARN Rpc: Invalid log level null, reverting to default.
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG util.ResourceLeakDetector: -Dio.netty.leakDetectionLevel: simple
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG util.Recycler: -Dio.netty.recycler.maxCapacity.default: 262144
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (41 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG internal.Cleaner0: java.nio.ByteBuffer.cleaner(): available
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Handling SASL challenge message...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Sending SASL challenge response...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (98 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (275 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Handling SASL challenge message...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Sending SASL challenge response...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (45 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: SASL negotiation finished with QOP auth.
16/11/12 11:42:58 [main]: DEBUG session.SparkSessionManagerImpl: New session (23ef000b-8281-4c2d-a1db-077a3d40366f) is created.
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkSubmitJob from=org.apache.hadoop.hive.ql.exec.spark.SparkTask>
16/11/12 11:42:58 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:58 INFO SparkContext: Running Spark version 1.6.0
16/11/12 11:42:58 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 0 (org.apache.hive.spark.client.BaseProtocol$SyncJobRequest).
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$SyncJobRequest (168 bytes)
16/11/12 11:42:58 [main]: DEBUG hdfs.DFSClient: /tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1: masked=rwx------
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #22
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #22
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: mkdirs took 9ms
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #23
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #23
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms
16/11/12 11:42:58 [main]: INFO ql.Context: New scratch dir is hdfs://HDFSNode:8020/tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1
16/11/12 11:42:58 [main]: DEBUG hdfs.DFSClient: /tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1/-mr-10003: masked=rwxr-xr-x
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #24
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #24
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: mkdirs took 5ms
16/11/12 11:42:58 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 1 (org.apache.hive.spark.client.BaseProtocol$JobRequest).
16/11/12 11:42:58 [main]: DEBUG client.SparkClientImpl: Send JobRequest[3e50626e-3a1e-47d0-a18c-76869e1277ae].
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$JobRequest (174708 bytes)
16/11/12 11:42:58 [main]: INFO log.PerfLogger: </PERFLOG method=SparkSubmitJob start=1478968978795 end=1478968978987 duration=192 from=org.apache.hadoop.hive.ql.exec.spark.SparkTask>
Starting Spark Job = 3e50626e-3a1e-47d0-a18c-76869e1277ae
16/11/12 11:42:58 [main]: INFO exec.Task: Starting Spark Job = 3e50626e-3a1e-47d0-a18c-76869e1277ae
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkRunJob from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkSubmitToRunning from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
state = SENT
16/11/12 11:42:58 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: Changing view acls to: jolsen
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: Changing modify acls to: jolsen
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jolsen); users with modify permissions: Set(jolsen)
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Utils: Successfully started service 'sparkDriver' on port 43254.
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Slf4jLogger: Slf4jLogger started
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Starting remoting
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.99:53727]
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@192.168.1.99:53727]
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 53727.
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SparkEnv: Registering MapOutputTracker
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SparkEnv: Registering BlockManagerMaster
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-1353a7a8-14f0-4b9d-b369-1224fbc9b331
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO MemoryStore: MemoryStore started with capacity 530.3 MB
state = SENT
16/11/12 11:42:59 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkEnv: Registering OutputCommitCoordinator
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkUI: Started SparkUI at http://192.168.1.99:4040
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkContext: Added JAR file:/usr/lib/hive/lib/hive-exec-1.1.0-cdh5.9.0.jar at spark://192.168.1.99:43254/jars/hive-exec-1.1.0-cdh5.9.0.jar with timestamp 1478968980504
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO RMProxy: Connecting to ResourceManager at HDFSNode/192.168.1.99:8032
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Requesting a new application from cluster with 10 NodeManagers
state = SENT
16/11/12 11:43:00 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (20480 MB per container)
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Setting up container launch context for our AM
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Setting up the launch environment for our AM container
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Preparing resources for our AM container
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar -> hdfs://HDFSNode:8020/user/jolsen/.sparkStaging/application_1478903884683_0032/spark-assembly-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar
state = SENT
16/11/12 11:43:01 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:02 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:02 INFO Client: Uploading resource file:/tmp/spark-695bd22c-574c-41ff-9e38-fa491f09c3f6/__spark_conf__2574250824979235523.zip -> hdfs://HDFSNode:8020/user/jolsen/.sparkStaging/application_1478903884683_0032/__spark_conf__2574250824979235523.zip
state = SENT
16/11/12 11:43:02 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: Changing view acls to: jolsen
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: Changing modify acls to: jolsen
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jolsen); users with modify permissions: Set(jolsen)
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO Client: Submitting application 32 to ResourceManager
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO YarnClientImpl: Submitted application application_1478903884683_0032
state = SENT
16/11/12 11:43:04 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:05 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:06 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:07 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:08 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:08 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
16/11/12 11:43:08 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: closed
16/11/12 11:43:08 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: stopped, remaining connections 0
state = SENT
16/11/12 11:43:09 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:10 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> HDFSNode, PROXY_URI_BASES -> http://HDFSNode:8151/proxy/application_1478903884683_0032), /proxy/application_1478903884683_0032
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
state = SENT
16/11/12 11:43:11 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:12 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:13 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:14 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:15 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:16 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:17 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:18 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:19 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:20 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:21 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:22 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:22 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
state = SENT
16/11/12 11:43:23 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:24 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:25 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:26 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:27 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:28 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:29 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:30 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:31 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:32 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:33 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:34 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:35 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:35 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
state = SENT
16/11/12 11:43:36 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:37 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:38 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:39 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:40 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:41 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:42 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:43 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:44 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:45 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:46 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:47 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:48 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:48 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
state = SENT
16/11/12 11:43:49 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:50 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:51 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:52 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:53 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:54 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:55 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:56 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:57 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:58 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:59 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:44:00 [main]: INFO status.SparkJobMonitor: state = SENT
Job hasn't been submitted after 61s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
16/11/12 11:44:00 [main]: ERROR status.SparkJobMonitor: Job hasn't been submitted after 61s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
16/11/12 11:44:00 [main]: ERROR status.SparkJobMonitor: Status: SENT
16/11/12 11:44:00 [main]: INFO log.PerfLogger: </PERFLOG method=SparkRunJob start=1478968978989 end=1478969040067 duration=61078 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
16/11/12 11:44:00 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 2 (org.apache.hive.spark.client.BaseProtocol$CancelJob).

【问题讨论】：

我刚刚在更新到 cdh5.8.0 的 Cloudera CDH 快速入门 VM 上尝试了相同的步骤并收到了相同的结果。 【参考方案1】：

我翻阅了 Hive 源代码，找到了日志消息的来源（例如“61 秒后未提交作业”），我看到了控制台的输出。从代码中，我能够找到一个默认为 60 秒的属性“hive.spark.job.monitor.timeout”，这正是我的工作超时的确切时间，因此我认为必须是正确的属性。我再次尝试了我的工作，每次都增加“hive.spark.job.monitor.timeout”，并将其增加到“180s”后，我的工作最终在超时之前执行。问题解决了。

我不知道为什么我的工作需要长达 3 分钟才能实际执行，这似乎是一个极长的延迟，但我将把这项研究留到下一次。

这是我的最终代码，它有效：

set mapred.job.queue.name=root.apps10;
set spark.master=yarn-client;
set hive.server2.enable.doAs=false;
set hive.execution.engine=spark;
set spark.eventLog.enabled=true;
set spark.shuffle.blockTransferService=nio;
set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory;
set spark.shuffle.service.enabled=true;
set spark.dynamicAllocation.enabled=true;
set hive.spark.job.monitor.timeout=180s;

DROP TABLE IF EXISTS testhiveonspark.temptable2;
CREATE TABLE testhiveonspark.temptable2
STORED AS TEXTFILE
AS SELECT num1, num2 FROM testhiveonspark.temptable1;

【讨论】：

以上是关于Spark CTAS 上的 Hive 使用 Straight SELECT 失败，但使用 SELECT GROUP BY 成功的主要内容，如果未能解决你的问题，请参考以下文章