Oozie Spark 操作工作流无法启动
Posted
技术标签:
【中文标题】Oozie Spark 操作工作流无法启动【英文标题】:Oozie Spark action workflow can not start 【发布时间】:2020-10-06 10:10:44 【问题描述】:我有一个无法通过 Oozie 运行的简单 Spark 作业。相同的火花作业通过火花提交运行。我提交作业工作流并收到以下错误:
2020-10-06 11:30:05,677 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error while initializing
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:368)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1760)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1757)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1691)
Caused by: java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:436)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:143)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.getFileSystem(MRAppMaster.java:605)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:315)
... 7 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:425)
... 19 more
Caused by: java.lang.IllegalAccessError: tried to access class org.apache.hadoop.security.token.Token$PrivateToken from class org.apache.hadoop.hdfs.HAUtil
at org.apache.hadoop.hdfs.HAUtil.cloneDelegationTokenForLogicalUri(HAUtil.java:271)
at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:105)
... 24 more
2020-10-06 11:30:05,689 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error while initializing
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:368)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1760)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1757)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1691)
Caused by: java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:436)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:143)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.getFileSystem(MRAppMaster.java:605)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:315)
... 7 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:425)
... 19 more
Caused by: java.lang.IllegalAccessError: tried to access class org.apache.hadoop.security.token.Token$PrivateToken from class org.apache.hadoop.hdfs.HAUtil
at org.apache.hadoop.hdfs.HAUtil.cloneDelegationTokenForLogicalUri(HAUtil.java:271)
at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:105)
... 24 more
2020-10-06 11:30:05,694 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error while initializing
这是job.properties:
nameNodeHost=***
resourceNodeHost=***
nameNode=hdfs://$nameNodeHost:8020
jobTracker=http://$resourceNodeHost:8050
queueName=DataWrangling
appHomeDir=/projects/msti
oozie.wf.application.path=$nameNode$appHomeDir/Oozie/move-data/workflow-daily-data.xml
resourceManager=hdfs://$resourceNodeHost:8050
oozie.use.system.libpath = true
这是工作流 XML:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="MSTI-Daily-Data">
<global>
<job-tracker>http://***:8050</job-tracker>
<name-node>hdfs://***:8020</name-node>
</global>
<credentials>
<credential name="hive_auth" type="hcat">
<property>
<name>hcat.metastore.principal</name>
<value>hive/_HOST@***/value>
</property>
<property>
<name>hcat.metastore.uri</name>
<value>thrift://***:9083</value>
</property>
</credential>
<credential name="hive_jdbc" type="hive2">
<property>
<name>hive2.jdbc.url</name>
<value>jdbc:hive2://***:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2</value>
</property>
<property>
<name>hive2.server.principal</name>
<value>hive/_HOST@***</value>
</property>
</credential>
</credentials>
<start to="import-sqooped-data"/>
<action name="import-sqooped-data">
<spark xmlns="uri:oozie:spark-action:0.2">
<job-tracker>$resourceManager</job-tracker>
<name-node>$nameNode</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>DataWrangling</value>
</property>
</configuration>
<master>yarn-cluster</master>
<name>Daily-Ingest</name>
<class>ClassName</class>
<jar>/path-to-scala-app.jar</jar>
<spark-opts>--driver-memory 16g --master yarn --queue QueueName --executor-memory 12G --num-executors 12 --executor-cores 2</spark-opts>
<file>/path-to-scala-app.jar</file>
<file>/path-to/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar</file>
</spark>
<ok to="success-e-mail"/>
<error to="failure-e-mail"/>
</action>
<action name="success-e-mail">
<email xmlns="uri:oozie:email-action:0.2">
<to>***</to>
<subject>Daily Data Job Succeeded</subject>
<body>The Oozie job completed successfully. See logs for details.</body>
</email>
<ok to="end"/>
<error to="kill"/>
</action>
<action name="failure-e-mail">
<email xmlns="uri:oozie:email-action:0.2">
<to>***</to>
<subject>Daily Data Job Failed</subject>
<body>The Oozie job failed. See logs for details.</body>
</email>
<ok to="kill"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>$wf:errorMessage(wf:lastErrorNode())</message>
</kill>
<end name="end"/>
</workflow-app>
同一个集群可以毫无问题地运行其他操作和工作流。一旦 spark 操作成为工作流的一部分,此错误就会导致启动器应用程序几乎立即失败。 任何帮助将不胜感激。
【问题讨论】:
您是否使用 Spark 库创建了共享库? 是的。我没有将它包含在问题中,因为无论火花库是否存在,错误总是相同的。以下是我的 job.properties 文件中的条目: oozie.use.system.libpath = true oozie.libpath = $nameNode/user/oozie/share/lib/lib_20190806101356 oozie.action.sharelib.for.spark=spark,蜂巢2 【参考方案1】:根据另一篇文章的建议,我已将 hadoop-client-3.1.1.3.1.0.0-78.jar 添加到 spark 共享库文件夹中。
这不起作用并产生了这个错误。删除该文件后,我得到了另一个错误。
所有其他类型的 oozie 操作都可以正常工作,只是使用该 hive 仓库连接器引发问题。这是新的错误。当包含 hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar 时会发生这种情况:
在 oozie sharelib 文件夹中 在火花动作中使用标签如果我不包含 hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar,我显然会得到 class not found 异常,但 scala 应用程序确实会执行。
ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.hadoop.io.retry.RetryUtil
s.getDefaultRetryPolicy(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;ZLjava/lang/String;Ljava/lang/String;Ljava/lang/Class;)Lorg/apache/hadoop/io/ret
ry/RetryPolicy;
at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:318)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:235)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
【讨论】:
以上是关于Oozie Spark 操作工作流无法启动的主要内容,如果未能解决你的问题,请参考以下文章