Hive/Impala--HAProxy实现Impala/HiveServer2负载均衡
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive/Impala--HAProxy实现Impala/HiveServer2负载均衡相关的知识,希望对你有一定的参考价值。
参考技术A 1、在集群中选择一个节点,使用yum方式安装HAProxy服务2.启动与停止HAProxy服务,并将服务添加到自启动列表
将/etc/haproxy目录下的haproxy.cfg文件备份,新建haproxy.cfg文件,添加如下配置
主要配置了HAProxy的http状态管理界面、impalashell和impalajdbc的负载均衡。
配置完成后重启HAProxy
浏览器访问http://hostname:1080/stats查看状态界面
使用多个终端同时访问,并执行SQL语句,查看是否会通过HAProxy服务自动负载到其它Impala Daemon节点
使用Impala shell访问HAProxy服务的25003端口,命令如下
打开第一个终端访问并执行SQL
同时打开第二个终端访问并执行SQL
通过以上测试可以看到,两个终端执行的SQL不在同一个Impala Daemon,这样就实现了Impala Daemon服务的负载均衡。
url改变为haproxy的host以及impala jdbc负载均衡配置的端口:
编辑/etc/haproxy/haproxy.cfg文件,在文件末尾增加如下配置
重启HAProxy服务
使用Beeline访问HAProxy服务的25005端口,命令如下
url改变为haproxy的host以及hive jdbc负载均衡配置的端口:
Hive简单实操
文章目录
前言
此为hive数据仓库部分实操Hive QL,仅供学习使用。
一、导入数据
load data (local) inpath "xx" into table xx;
二、insert导出
insert overwrite (local) directory
'/opt/module/hive/data/export/student1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t'
select * from student;
三、实操
1
SELECT
videoId,
views
FROM
gulivideo_orc
ORDER BY
views DESC
LIMIT 10;
2
SELECT
t1.category_name ,
COUNT(t1.videoId) hot
FROM
(
SELECT
videoId,
category_name
FROM
gulivideo_orc
lateral VIEW explode(category) gulivideo_orc_tmp AS category_name
) t1
GROUP BY
t1.category_name
ORDER BY
hot
DESC
LIMIT 10
3
SELECT
t2.category_name,
COUNT(t2.videoId) video_sum
FROM
(
SELECT
t1.videoId,
category_name
FROM
(
SELECT
videoId,
views ,
category
FROM
gulivideo_orc
ORDER BY
views
DESC
LIMIT 20
) t1
lateral VIEW explode(t1.category) t1_tmp AS category_name
) t2
GROUP BY t2.category_name
4
SELECT
t6.category_name,
t6.video_sum,
rank() over(ORDER BY t6.video_sum DESC ) rk
FROM
(
SELECT
t5.category_name,
COUNT(t5.relatedid_id) video_sum
FROM
(
SELECT
t4.relatedid_id,
category_name
FROM
(
SELECT
t2.relatedid_id ,
t3.category
FROM
(
SELECT
relatedid_id
FROM
(
SELECT
videoId,
views,
relatedid
FROM
gulivideo_orc
ORDER BY
views
DESC
LIMIT 50
)t1
lateral VIEW explode(t1.relatedid) t1_tmp AS relatedid_id
)t2
JOIN
gulivideo_orc t3
ON
t2.relatedid_id = t3.videoId
) t4
lateral VIEW explode(t4.category) t4_tmp AS category_name
) t5
GROUP BY
t5.category_name
ORDER BY
video_sum
DESC
) t6
5
SELECT
t1.videoId,
t1.views,
t1.category_name
FROM
(
SELECT
videoId,
views,
category_name
FROM gulivideo_orc
lateral VIEW explode(category) gulivideo_orc_tmp AS category_name
)t1
WHERE
t1.category_name = "Music"
ORDER BY
t1.views
DESC
LIMIT 10
6
SELECT
t2.videoId,
t2.views,
t2.category_name,
t2.rk
FROM
(
SELECT
t1.videoId,
t1.views,
t1.category_name,
rank() over(PARTITION BY t1.category_name ORDER BY t1.views DESC ) rk
FROM
(
SELECT
videoId,
views,
category_name
FROM gulivideo_orc
lateral VIEW explode(category) gulivideo_orc_tmp AS category_name
)t1
)t2
WHERE t2.rk <=10
7
SELECT
t2.videoId,
t2.views,
t2.uploader
FROM
(
SELECT
uploader,
videos
FROM gulivideo_user_orc
ORDER BY
videos
DESC
LIMIT 10
) t1
JOIN gulivideo_orc t2
ON t1.uploader = t2.uploader
ORDER BY
t2.views
DESC
以上是关于Hive/Impala--HAProxy实现Impala/HiveServer2负载均衡的主要内容,如果未能解决你的问题,请参考以下文章
头歌(Educoder)实践教学平台——Hive综合应用案例
在hdfs中为hive创建目录(/user/hive/warehouse)不成功,显示已经存在,但是找不到?求高手指点,先谢了