Oozie Sqoop 作业 - 无法恢复作业
Posted
技术标签:
【中文标题】Oozie Sqoop 作业 - 无法恢复作业【英文标题】:Oozie Sqoop job - cannot restore job 【发布时间】:2016-02-03 03:47:30 【问题描述】:在 HDP 2.3.4 上,使用 Oozie 4.2.0 和 Sqoop 1.4.2,我正在尝试创建一个协调器应用程序,该应用程序将每天执行 sqoop 作业。我需要 sqoop 操作来执行作业,因为这些是增量导入。
我已经配置了sqoop-site.xml
并启动了sqoop-metastore
,我可以通过命令行创建、列出和删除作业,但是工作流遇到错误:无法恢复作业:streamsummary_incremental强>
标准错误
Sqoop command arguments :
job
--exec
streamsummary_incremental
Fetching child yarn jobs
tag id : oozie-26fcd4dc0afd8f53316fc929ac38eae2
2016-02-03 09:46:47,193 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at <myHost>/<myIP>:8032
Child yarn jobs are found -
=================================================================
>>> Invoking Sqoop command line now >>>
2241 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2016-02-03 09:46:47,404 WARN [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(177)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2263 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6.2.3.4.0-3485
2016-02-03 09:46:47,426 INFO [main] sqoop.Sqoop (Sqoop.java:<init>(97)) - Running Sqoop version: 1.4.6.2.3.4.0-3485
2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage - Cannot restore job: streamsummary_incremental
2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(254)) - Cannot restore job: streamsummary_incremental
2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage - (No such job)
2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(255)) - (No such job)
2553 [main] ERROR org.apache.sqoop.tool.JobTool - I/O error performing job operation: java.io.IOException: Cannot restore missing job streamsummary_incremental
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:256)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
sqoop-site.xml
<property>
<name>sqoop.metastore.client.enable.autoconnect</name>
<value>false</value>
<description>If true, Sqoop will connect to a local metastore
for job management when no other metastore arguments are
provided.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.url</name>
<value>jdbc:hsqldb:hsql://<myhost>:12345</value>
<description>The connect string to use when connecting to a
job-management metastore. If unspecified, uses ~/.sqoop/.
You can specify a different path here.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.username</name>
<value>SA</value>
<description>The username to bind to the metastore.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.password</name>
<value></value>
<description>The password to bind to the metastore.
</description>
</property>
<property>
<name>sqoop.metastore.server.location</name>
<value>/tmp/sqoop-metastore/shared.db</value>
<description>Path to the shared metastore database files.
If this is not set, it will be placed in ~/.sqoop/.
</description>
</property>
<property>
<name>sqoop.metastore.server.port</name>
<value>12345</value>
<description>Port that this metastore should listen on.
</description>
</property>
workflow.xml
<action name="sqoop-import-job">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>$jobTracker</job-tracker>
<name-node>$nameNode</name-node>
<prepare>
<delete path="$outputDir"/>
</prepare>
<arg>job</arg>
<arg>--exec</arg>
<arg>$jobId</arg>
</sqoop>
<ok to="hive-load"/>
<error to="kill-sqoop"/>
</action>
附加信息:
我们只运行一个单节点集群。 只有 Sqoop 客户端是 已安装。我在想也许 Oozie 无法连接到元存储,因为我们没有 sqoop 服务器?谁能证实这一点?如果不是这样,我会错过其他任何事情吗?
谢谢!
【问题讨论】:
Sqoop 抱怨的第一件事是$SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
你有没有尝试以某种方式设置它?
另外,您是否尝试将 Metastore URL 作为 Sqoop 参数 (在命令行中然后在 Oozie 中),看看是否可行?您是否通过 Squirrel SQL 或类似的方式连接到 Metastore,以检查您的测试是否真的针对 that Metastore 而不是默认(本地文件)?
@SamsonScharfrichter 不,我没有尝试设置$SQOOP_CONF_DIR
。这似乎是 Hortonworks 的事情,因为似乎没有设置任何常用的 Hadoop 环境变量。我不确定这是否会影响除该警告之外的任何其他内容,因为我可以通过命令行正常运行 Sqoop。
@SamsonScharfrichter 我没有尝试过您将 Metastore URL 作为命令行参数提供的建议,尽管我确实尝试将 mysql 配置为 guide 之后的 Metastore,并在创建作业时,我查了一下,发现作业在 MySQL 中。
旁注:我真的不明白你所说的 “这似乎是 Hortonworks 的事情” 的意思——在 Google 上快速搜索会显示一些点击Cloudera 集群的此错误消息。但没有提及变量本身。可能是未发布补丁遗留下来的恶意消息标签...
【参考方案1】:
在 cmets 中 @SamsonScharfrichter 的帮助下,我设法解决了这个问题。我在 Oozie 工作流程中明确传递了元存储 URL,并且它起作用了:
<arg>job</arg>
<arg>--meta-connect</arg>
<arg>jdbc:hsqldb:hsql://<myhost>:12345/sqoop</arg>
<arg>--exec</arg>
<arg>myjob</arg>
似乎 Oozie 尝试连接到本地元存储,因为它没有 sqoop-site.xml
的副本,因此它不知道元存储 url(即使我正在运行单节点配置)。
【讨论】:
以上是关于Oozie Sqoop 作业 - 无法恢复作业的主要内容,如果未能解决你的问题,请参考以下文章
sqoop 作业 shell 脚本在 oozie 中并行执行
通过 oozie 从 sqoop 作业增量导入不会更新 sqoop 元存储中的 incremental.last.value
sqoop 作业将数据导出到 mysql,卡在地图 100% 且状态正在运行