MapReduce 作业继续以 map = 0%、reduce = 0% 运行数小时
Posted
技术标签:
【中文标题】MapReduce 作业继续以 map = 0%、reduce = 0% 运行数小时【英文标题】:MapReduce Job continues to run with map = 0%, reduce = 0% for hours 【发布时间】:2019-04-19 01:30:57 【问题描述】:我正在运行一个 Hive 查询,看起来像
create table table1 as select split(comments,' ') as words from table2;
cmets 列有以空格分隔的字符串形式的评论 cmets。
当我运行此查询时,MapReduce 作业开始并继续以 Map 0% 运行数小时。在这个过程中它不会给出任何错误。
hive> create table jw_1 as select split(comments,' ') from removed_null_values;
Query ID = xxx-190418201314_7781cf59-6afb-4e82-ab75-c7e343c4985e
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1555607912038_0013, Tracking URL = http://xxx-VirtualBox:8088/proxy/application_1555607912038_0013/
Kill Command = /usr/local/bin/hadoop-3.2.0/bin/mapred job -kill job_1555607912038_0013
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-18 20:13:30,568 Stage-1 map = 0%, reduce = 0%
2019-04-18 20:14:31,140 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 39.6 sec
2019-04-18 20:15:31,311 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 101.64 sec
2019-04-18 20:16:31,451 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 146.5 sec
2019-04-18 20:17:31,684 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 212.08 sec
但是当我尝试时
select split(comments,' ') from table2;
我可以在shell中看到数组形式的cmets。
["\"Lauren","was","promptly","responsive","in","advance","of","our","booking.","providing","a","lot","of","helpful","info.","And","she","stayed","in","contact","and","was","readily","available","prior","to","and","during","our","stay.","which","was","awesome.","The","location.","price","and","privacy","were","the","real","benefits."]
我还运行了一些其他查询,其中 MapReduce 作业完成并产生了所需的结果
我目前正在使用 Hive 3.1.1
基本上,我想创建一个包含单词的数组的新表,然后标记该列
我是 Hive 的新手,我正在对 35MB 的数据文件进行情感分析。
【问题讨论】:
【参考方案1】:在您的第一种情况下,您很可能在转换为 MapReduce 时没有完成 Hive 查询所需的资源。您必须查看 YARN 或 MR1 以确定您是否有足够的计算资源来运行 MapReduce 作业。
在第二个查询中,一些 Hive 查询触发器不会触发 MapReduce 作业,这就是它返回的原因。请参阅How does Hive decide when to use map reduce and when not to? 了解更多信息。
【讨论】:
感谢您的回复。我了解第二个查询不会触发 MapReduce 作业。我已经运行了一些其他查询,这些查询在转换为 MapReduce 时成功运行。我也仔细检查了我的资源。仅当我将 CTAS(将表创建为选择)和 SPLIT() 与相同的查询一起使用时,我才会遇到此问题。 查看您的日志以获取线索。 Hive、Hive Metastore,如果您使用 YARN,资源管理器和节点管理器日志,如果您使用 MR1,TaskTracker 和 JobTracker 日志。以上是关于MapReduce 作业继续以 map = 0%、reduce = 0% 运行数小时的主要内容,如果未能解决你的问题,请参考以下文章