Oozie 工作流配置单元操作卡在 RUNNING 中
Posted
技术标签:
【中文标题】Oozie 工作流配置单元操作卡在 RUNNING 中【英文标题】:Oozie workflow hive action stuck in RUNNING 【发布时间】:2015-02-18 18:27:35 【问题描述】:我正在运行 Hortonworks 发行版的 Hadoop 2.4.0、Oozie 4.0.0、Hive 0.13.0。
我有多个 Oozie 协调员作业,它们可能会同时启动工作流。每个协调器作业都监视不同的目录,当 _SUCCESS 文件出现在这些目录中时,将启动工作流。
工作流运行 Hive 操作,该操作从外部目录读取并复制内容。
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
DROP TABLE IF EXISTS $INPUT_TABLE;
CREATE external TABLE IF NOT EXISTS $INPUT_TABLE (
id bigint,
data string,
creationdate timestamp,
datelastupdated timestamp)
LOCATION '$INPUT_LOCATION';
-- Read from external table and insert into a partitioned Hive table
FROM $INPUT_TABLE ent
INSERT OVERWRITE TABLE mytable PARTITION(data)
SELECT ent.id, ent.data, ent.creationdate, ent.datelastupdated;
当我只运行一个协调器来启动一个工作流时,工作流和配置单元操作成功完成,没有任何问题。
当多个工作流同时启动时,hive 操作会长时间处于 RUNNING 状态。
如果我查看作业系统日志,我会看到:
2015-02-18 17:18:26,048 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1423085109915_0223_m_000000 Task Transitioned from SCHEDULED to RUNNING
2015-02-18 17:18:26,586 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1423085109915_0223: ask=3 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:32768, vCores:-3> knownNMs=1
2015-02-18 17:18:27,677 INFO [Socket Reader #1 for port 38704] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1423085109915_0223 (auth:SIMPLE)
2015-02-18 17:18:27,696 INFO [IPC Server handler 0 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1423085109915_0223_m_000002 asked for a task
2015-02-18 17:18:27,697 INFO [IPC Server handler 0 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1423085109915_0223_m_000002 given task: attempt_1423085109915_0223_m_000000_0
2015-02-18 17:18:34,951 INFO [IPC Server handler 2 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:19:05,060 INFO [IPC Server handler 11 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:19:35,161 INFO [IPC Server handler 28 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:20:05,262 INFO [IPC Server handler 2 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:20:35,358 INFO [IPC Server handler 11 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:21:02,452 INFO [IPC Server handler 23 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:21:32,545 INFO [IPC Server handler 1 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:22:02,668 INFO [IPC Server handler 12 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
它只是一遍又一遍地打印“TaskAttempt 的进度”。
我们的 yarn-site.xml 被配置为使用这个:
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
我应该改用其他调度程序吗?
目前我不确定问题出在 Oozie 还是 Hive。
【问题讨论】:
【参考方案1】:事实证明,这与此处列出的 HEART BEAT 问题相同:
Error on running multiple Workflow in OOZIE-4.1.0
在将调度程序更改为上述帖子中所述的 FairScheduler 后,我能够运行多个工作流。
【讨论】:
以上是关于Oozie 工作流配置单元操作卡在 RUNNING 中的主要内容,如果未能解决你的问题,请参考以下文章