蜂巢计数查询无法完成它永远运行
Posted
技术标签:
【中文标题】蜂巢计数查询无法完成它永远运行【英文标题】:hive count query not able to complete it run forever 【发布时间】:2015-04-20 05:05:16 【问题描述】:我是 Hive 的新手,我正在使用 HBASE-1.1.0 、Hadoop-2.5.1 和 Hive-0.13 来满足我的要求。
设置非常好,我可以使用直线运行 hive 查询。
查询:从 X_Table 中选择 count(*)。
查询完成时间为 37.848 秒。
我使用 Maven 项目设置的相同环境并尝试使用 Hive Client 执行一些选择查询,它运行良好。但是当我尝试执行相同的计数查询时,Mapreduce 作业无法完成。它看起来像重新开始工作。我该如何解决这个问题?
代码
Connection con = DriverManager.getConnection("jdbc:hive2://abc:10000/default","", "");
Statement stmt = con.createStatement();
String query = "select count(*) from X_Table
ResultSet res = stmt.executeQuery(query);
while (res.next())
//code here
日志详情:
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1429243611915_0030, Tracking URL = http://master:8088/proxy/application_1429243611915_0030/
Kill Command = /usr/local/pcs/hadoop/bin/hadoop job -kill job_1429243611915_0030
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-04-20 09:28:02,616 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:29:02,728 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:30:03,432 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:31:04,054 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:32:04,675 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:33:05,298 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:34:05,866 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:35:06,419 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:36:06,985 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:37:07,551 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:38:08,289 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:39:09,184 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:40:09,780 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:41:10,367 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:42:10,965 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:43:11,595 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:44:12,181 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:45:12,952 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:46:13,590 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:47:14,218 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:48:14,790 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:49:15,378 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:50:16,014 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:51:16,808 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:52:17,378 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:53:17,928 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:54:18,491 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:55:19,049 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:56:19,797 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:57:20,344 Stage-1 map = 0%, reduce = 0%
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1429243611915_0031, Tracking URL = http://master:8088/proxy/application_1429243611915_0031/
Kill Command = /usr/local/pcs/hadoop/bin/hadoop job -kill job_1429243611915_0031
2015-04-20 09:58:20,858 Stage-1 map = 0%, reduce = 0%
【问题讨论】:
你找到解决办法了吗?? 【参考方案1】:如果你在yarn-site.xml
文件中为这两个配置增加内存,那么它会运行得很快。
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.resource.memory-mb
【讨论】:
【参考方案2】:上面的答案很有效,它真的对我帮助很大。我试图在 HIVE 中运行一个简单的 count(*) 查询,但它既不会出错也不会完成。它会一直挂在那里,直到我在命令提示符下终止工作。我完全疯了,我没有从谷歌得到适当的答案。 但是上面的回答对我帮助很大。所以我们需要增加内存
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.resource.memory-mb
这可以在 Yarn-Site.xml
中完成,甚至可以在 Yarn Service 下的 Cloudera Manager 中完成。增加内存后,重新启动所有过时的服务。这将解决问题。
【讨论】:
以上是关于蜂巢计数查询无法完成它永远运行的主要内容,如果未能解决你的问题,请参考以下文章
在 AWS Batch 上运行时,Redshift 频谱查询永远不会终止