MapReduce 作业继续以 map = 0%、reduce = 0% 运行数小时

Posted 2023-03-06

技术标签:

【中文标题】MapReduce 作业继续以 map = 0%、reduce = 0% 运行数小时【英文标题】：MapReduce Job continues to run with map = 0%, reduce = 0% for hours 【发布时间】：2019-04-19 01:30:57 【问题描述】：

我正在运行一个 Hive 查询，看起来像

create table table1 as select split(comments,' ') as words from table2;

cmets 列有以空格分隔的字符串形式的评论 cmets。

当我运行此查询时，MapReduce 作业开始并继续以 Map 0% 运行数小时。在这个过程中它不会给出任何错误。

hive> create table jw_1 as select split(comments,' ') from removed_null_values;
Query ID = xxx-190418201314_7781cf59-6afb-4e82-ab75-c7e343c4985e
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1555607912038_0013, Tracking URL = http://xxx-VirtualBox:8088/proxy/application_1555607912038_0013/
Kill Command = /usr/local/bin/hadoop-3.2.0/bin/mapred job  -kill job_1555607912038_0013
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-18 20:13:30,568 Stage-1 map = 0%,  reduce = 0%
2019-04-18 20:14:31,140 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 39.6 sec
2019-04-18 20:15:31,311 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 101.64 sec
2019-04-18 20:16:31,451 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 146.5 sec
2019-04-18 20:17:31,684 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 212.08 sec

但是当我尝试时

select split(comments,' ') from table2;

我可以在shell中看到数组形式的cmets。

["\"Lauren","was","promptly","responsive","in","advance","of","our","booking.","providing","a","lot","of","helpful","info.","And","she","stayed","in","contact","and","was","readily","available","prior","to","and","during","our","stay.","which","was","awesome.","The","location.","price","and","privacy","were","the","real","benefits."]

我还运行了一些其他查询，其中 MapReduce 作业完成并产生了所需的结果

我目前正在使用 Hive 3.1.1

基本上，我想创建一个包含单词的数组的新表，然后标记该列

我是 Hive 的新手，我正在对 35MB 的数据文件进行情感分析。

【问题讨论】：

【参考方案1】：

在您的第一种情况下，您很可能在转换为 MapReduce 时没有完成 Hive 查询所需的资源。您必须查看 YARN 或 MR1 以确定您是否有足够的计算资源来运行 MapReduce 作业。

在第二个查询中，一些 Hive 查询触发器不会触发 MapReduce 作业，这就是它返回的原因。请参阅How does Hive decide when to use map reduce and when not to? 了解更多信息。

【讨论】：

感谢您的回复。我了解第二个查询不会触发 MapReduce 作业。我已经运行了一些其他查询，这些查询在转换为 MapReduce 时成功运行。我也仔细检查了我的资源。仅当我将 CTAS（将表创建为选择）和 SPLIT() 与相同的查询一起使用时，我才会遇到此问题。查看您的日志以获取线索。 Hive、Hive Metastore，如果您使用 YARN，资源管理器和节点管理器日志，如果您使用 MR1，TaskTracker 和 JobTracker 日志。

以上是关于MapReduce 作业继续以 map = 0%、reduce = 0% 运行数小时的主要内容，如果未能解决你的问题，请参考以下文章