在 Oozie-Spark 动作中添加多个罐子
Posted
技术标签:
【中文标题】在 Oozie-Spark 动作中添加多个罐子【英文标题】:adding multiple jars in Oozie-Spark action 【发布时间】:2017-07-20 08:44:00 【问题描述】:我正在使用 HDP2.6。 oozie 4.2 安装在哪里。和 Spark2。
在我在此站点上跟踪 Hortonworks 指南后:https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-spark-action.html 用于在 4.2 中为 Spark2 添加库。 Oozie 的版本。
在我使用此插件提交作业后:
oozie.action.sharelib.for.spark=spark2
我得到的错误是:
2017-07-19 12:36:53,271 WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
2017-07-19 12:36:53,275 WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher exception: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
java.lang.IllegalArgumentException: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13$$anonfun$apply$8.apply(Client.scala:629)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13$$anonfun$apply$8.apply(Client.scala:620)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:620)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:619)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:619)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:892)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1228)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1287)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:311)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:232)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:239)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
我了解到,由于 Spark 处理分布式缓存中的多个文件的方式发生了变化,因此新的 Spark2 将无法与 Spark 2.1(无论如何通过 oozie)一起使用,如下所述:see here
请记住,我使用的是 Ambari 和 HDP2.6。我该如何处理?
【问题讨论】:
【参考方案1】:您需要将oozie
目录和spark2
目录的内容检查到Oozie 共享库中。如果两者中都有任何罐子,只需将它们从一个地方取出并重试。另外,请执行 oozie admin sharelub update 命令来更新它。
希望这会对你有所帮助。
【讨论】:
我尝试从其中一个文件夹中删除所有 jar,这些 jar 存在于两个文件夹oozie
和 spark2
中,但没有成功。但是我没有做的一件事是在我删除它们并直接去测试它之后更新sharelib。我会在你有目的之后尝试更新 sharelib,然后告诉你是如何完成的 :) 谢谢顺便说一句 :)
假期过后,我终于尝试了你所说的,是的,它现在可以工作了。非常感谢oyu :)以上是关于在 Oozie-Spark 动作中添加多个罐子的主要内容,如果未能解决你的问题,请参考以下文章