Oozie Spark (2.x) 动作总是卡在接受状态
Posted
技术标签:
【中文标题】Oozie Spark (2.x) 动作总是卡在接受状态【英文标题】:Oozie Spark (2.x) action is always getting stuck at accepted state 【发布时间】:2018-09-20 18:55:42 【问题描述】:当我通过 oozie 运行 spark 作业时,它总是卡在接受的状态。我按照hornwork doc 设置了 spark2 库。
当我对同一个 spark 作业使用 oozie shell 动作时,它工作得非常好,并且通过边缘节点的 spark-submit 以及相同的 spark opts。
下面是我的workflow.xml
<global>
<job-tracker>$jobTracker</job-tracker>
<name-node>$nameNode</name-node>
<configuration>
<property>
<name>mapreduce.job.queuename</name>
<value>$queueName</value>
</property>
</configuration>
</global>
<credentials>
<credential name="HiveCreds" type="hive2">
<property>
<name>hive2.jdbc.url</name>
<value>jdbc:hive2://$hive2_server:$hive2_port/default</value>
</property>
<property>
<name>hive2.server.principal</name>
<value>hive/$hive2_server@DOMAIN</value>
</property>
</credential>
</credentials>
<!--spark action using spark 2 libraries -->
<start to="SPARK2JOB" />
<action name="SPARK2JOB" cred="HiveCreds">
<spark
xmlns="uri:oozie:spark-action:0.1">
<job-tracker>$jobTracker</job-tracker>
<name-node>$nameNode</name-node>
<master>$master</master>
<mode>$mode</mode>
<name>$appName</name>
<class>$mainClass</class>
<jar>$hdfsJarLoc$uberJar</jar>
<spark-opts>--num-executors $noOfExec
--executor-cores $execCores
--executor-memory $execMem
--driver-memory $drivMem
--driver-cores $drivCores
--conf spark.dynamicAllocation.enabled=$dynamicAllocation</spark-opts>
<arg>$sourceFilePath</arg>
<arg>$sourceFileName</arg>
<arg>$outputFilePath</arg>
<arg>$outputFileDir</arg>
</spark>
<ok to="end" />
<error to="errorHandler" />
</action>
我的job.properties
jobTracker=HOST:8050
nameNode=hdfs://HOST:8020
hive2_server=HOSTNAME
hive2_port=10000
queueName=default
# Standard useful properties
oozie.use.system.libpath=true
#oozie.wf.rerun.failnodes=true
ooziePath=/path/
#oozie.coord.application.path=$ooziePath
## Oozie path & Standard properties
oozie.wf.application.path=$ooziePath
oozie.libpath = $ooziePath/Lib
oozie.action.sharelib.for.spark=spark2
master=yarn-cluster
mode=cluster
appName=APP_NAME
mainClass=MAIN_CLASS
uberJar=UBER_JAR
noOfExec=2
execCores=2
execMem=2G
drivMem=2g
drivCores=2
dynamicAllocation=false
我检查了 oozie spark2 库我有 /usr/hdp/2.6.3.0-235/spark2/jars/
中的所有 jar
我的 oozie 库:
/user/oozie/share/lib/lib_20180116141700/oozie/aws-java-sdk-core-1.10.6.jar
/user/oozie/share/lib/lib_20180116141700/oozie/aws-java-sdk-kms-1.10.6.jar
/user/oozie/share/lib/lib_20180116141700/oozie/aws-java-sdk-s3-1.10.6.jar
/user/oozie/share/lib/lib_20180116141700/oozie/azure-data-lake-store-sdk-2.1.4.jar
/user/oozie/share/lib/lib_20180116141700/oozie/azure-keyvault-core-0.8.0.jar
/user/oozie/share/lib/lib_20180116141700/oozie/azure-storage-5.4.0.jar
/user/oozie/share/lib/lib_20180116141700/oozie/commons-lang3-3.4.jar
/user/oozie/share/lib/lib_20180116141700/oozie/guava-11.0.2.jar
/user/oozie/share/lib/lib_20180116141700/oozie/hadoop-aws-2.7.3.2.6.3.0-235.jar
/user/oozie/share/lib/lib_20180116141700/oozie/hadoop-azure-2.7.3.2.6.3.0-235.jar
/user/oozie/share/lib/lib_20180116141700/oozie/hadoop-azure-datalake-2.7.3.2.6.3.0-235.jar
/user/oozie/share/lib/lib_20180116141700/oozie/jackson-annotations-2.4.0.jar
/user/oozie/share/lib/lib_20180116141700/oozie/jackson-core-2.4.4.jar
/user/oozie/share/lib/lib_20180116141700/oozie/jackson-databind-2.4.4.jar
/user/oozie/share/lib/lib_20180116141700/oozie/joda-time-2.9.6.jar
/user/oozie/share/lib/lib_20180116141700/oozie/json-simple-1.1.jar
/user/oozie/share/lib/lib_20180116141700/oozie/okhttp-2.4.0.jar
/user/oozie/share/lib/lib_20180116141700/oozie/okio-1.4.0.jar
/user/oozie/share/lib/lib_20180116141700/oozie/oozie-hadoop-utils-hadoop-2-4.2.0.2.6.3.0-235.jar
/user/oozie/share/lib/lib_20180116141700/oozie/oozie-sharelib-oozie-4.2.0.2.6.3.0-235.jar
下面是错误堆栈:
它将卡在 ACCEPTED 状态(如下所示)一个小时左右
[main] INFO org.apache.spark.deploy.yarn.Client - Application report for application_1537404298109_2008 (state: ACCEPTED)
标准输出:
INFO org.apache.spark.deploy.yarn.Client - Application report for application_1537404298109_2008 (state: ACCEPTED)
2018-09-20 14:49:15,158 [main] INFO org.apache.spark.deploy.yarn.Client - Application report for application_1537404298109_2008 (state: FAILED)
2018-09-20 14:49:15,158 [main] INFO org.apache.spark.deploy.yarn.Client -
client token: N/A
diagnostics: Application application_1537404298109_2008 failed 2 times due to AM Container for appattempt_1537404298109_2008_000002 exited with exitCode: -1000
For more detailed output, check the application tracking page: http://hostname:8088/cluster/app/application_1537404298109_2008 Then click on links to logs of each attempt.
Diagnostics: org.apache.hadoop.security.authorize.AuthorizationException: User:yarn not allowed to do 'DECRYPT_EEK' on 'testkey1'
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1537468694601
final status: FAILED
tracking URL: http://hostname:8088/cluster/app/application_1537404298109_2008
user: username
2018-09-20 14:49:16,189 [main] INFO org.apache.spark.deploy.yarn.Client - Deleted staging directory
<<< Invocation of Spark command completed <<<<<< Invocation of Spark command completed <<<
Hadoop Job IDs executed by Spark: job_1537404298109_2008
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Application application_1537404298109_2008 finished with failed status
org.apache.spark.SparkException: Application application_1537404298109_2008 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:314)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:235)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:240)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Oozie Launcher failed, finishing Hadoop job gracefully
标准错误:
org.apache.spark.SparkException: Application application_1537404298109_2008 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:314)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:235)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:240)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.impl.MetricsSystemImpl).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
我尝试了在 hortonworks 社区和堆栈中找到的所有可能的解决方案来解决同一类型的问题,但对我没有任何帮助。如果您需要任何其他信息来帮助我,我很乐意将其添加到问题中。
提前致谢!!!
【问题讨论】:
【参考方案1】:org.apache.hadoop.security.authorize.AuthorizationException: User:yarn not allowed to do 'DECRYPT_EEK
DECRYPT_EEK 是 Ranger 中需要授予用户的权限。 如果你是游侠管理员,请联系管理员。
【讨论】:
我注意到了这个错误,但为什么它不能通过 oozie spark 动作工作,而是通过 shell 动作和 spark 提交工作?以上是关于Oozie Spark (2.x) 动作总是卡在接受状态的主要内容,如果未能解决你的问题,请参考以下文章
使用火花动作在 Oozie 中的 python Spark 作业