Hive/Impala--HAProxy实现Impala/HiveServer2负载均衡

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive/Impala--HAProxy实现Impala/HiveServer2负载均衡相关的知识,希望对你有一定的参考价值。

参考技术A 1、在集群中选择一个节点,使用yum方式安装HAProxy服务

2.启动与停止HAProxy服务,并将服务添加到自启动列表

将/etc/haproxy目录下的haproxy.cfg文件备份,新建haproxy.cfg文件,添加如下配置

主要配置了HAProxy的http状态管理界面、impalashell和impalajdbc的负载均衡。
配置完成后重启HAProxy

浏览器访问http://hostname:1080/stats查看状态界面

使用多个终端同时访问,并执行SQL语句,查看是否会通过HAProxy服务自动负载到其它Impala Daemon节点
使用Impala shell访问HAProxy服务的25003端口,命令如下

打开第一个终端访问并执行SQL

同时打开第二个终端访问并执行SQL

通过以上测试可以看到,两个终端执行的SQL不在同一个Impala Daemon,这样就实现了Impala Daemon服务的负载均衡。

url改变为haproxy的host以及impala jdbc负载均衡配置的端口:

编辑/etc/haproxy/haproxy.cfg文件,在文件末尾增加如下配置

重启HAProxy服务

使用Beeline访问HAProxy服务的25005端口,命令如下

url改变为haproxy的host以及hive jdbc负载均衡配置的端口:

Hive简单实操

文章目录


前言

此为hive数据仓库部分实操Hive QL,仅供学习使用。


一、导入数据

load data (local) inpath "xx" into table xx;

二、insert导出

insert overwrite (local) directory 
'/opt/module/hive/data/export/student1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t'
select * from student;

三、实操

1

SELECT
videoId,
views
FROM
gulivideo_orc
ORDER BY
views DESC
LIMIT 10;

2

SELECT
t1.category_name ,
COUNT(t1.videoId) hot
FROM
(
SELECT
videoId,
category_name
FROM
gulivideo_orc
lateral VIEW explode(category) gulivideo_orc_tmp AS category_name
) t1
GROUP BY
t1.category_name
ORDER BY
hot
DESC
LIMIT 10

3

SELECT
t2.category_name,
COUNT(t2.videoId) video_sum
FROM
(
SELECT
t1.videoId,
category_name
FROM
(
SELECT
videoId,
views ,
category
FROM
gulivideo_orc
ORDER BY
views
DESC
LIMIT 20
) t1
lateral VIEW explode(t1.category) t1_tmp AS category_name
) t2
GROUP BY t2.category_name

4

SELECT
t6.category_name,
t6.video_sum,
rank() over(ORDER BY t6.video_sum DESC ) rk
FROM
(
SELECT
t5.category_name,
COUNT(t5.relatedid_id) video_sum
FROM
(
SELECT
t4.relatedid_id,
category_name
FROM
(
SELECT
t2.relatedid_id ,
t3.category
FROM
(
SELECT
relatedid_id
FROM
(
SELECT
videoId,
views,
relatedid
FROM
gulivideo_orc
ORDER BY
views
DESC
LIMIT 50
)t1
lateral VIEW explode(t1.relatedid) t1_tmp AS relatedid_id
)t2
JOIN
gulivideo_orc t3
ON
t2.relatedid_id = t3.videoId
) t4
lateral VIEW explode(t4.category) t4_tmp AS category_name
) t5
GROUP BY
t5.category_name
ORDER BY
video_sum
DESC
) t6

5

SELECT
t1.videoId,
t1.views,
t1.category_name
FROM
(
SELECT
videoId,
views,
category_name
FROM gulivideo_orc
lateral VIEW explode(category) gulivideo_orc_tmp AS category_name
)t1
WHERE
t1.category_name = "Music"
ORDER BY
t1.views
DESC
LIMIT 10

6

SELECT
t2.videoId,
t2.views,
t2.category_name,
t2.rk
FROM
(
SELECT
t1.videoId,
t1.views,
t1.category_name,
rank() over(PARTITION BY t1.category_name ORDER BY t1.views DESC ) rk
FROM
(
SELECT
videoId,
views,
category_name
FROM gulivideo_orc
lateral VIEW explode(category) gulivideo_orc_tmp AS category_name
)t1
)t2
WHERE t2.rk <=10

7

SELECT
t2.videoId,
t2.views,
t2.uploader
FROM
(
SELECT
uploader,
videos
FROM gulivideo_user_orc
ORDER BY
videos
DESC
LIMIT 10
) t1
JOIN gulivideo_orc t2
ON t1.uploader = t2.uploader
ORDER BY
t2.views
DESC

以上是关于Hive/Impala--HAProxy实现Impala/HiveServer2负载均衡的主要内容,如果未能解决你的问题,请参考以下文章

大数据外出实训报告10

头歌(Educoder)实践教学平台——Hive综合应用案例

hive中条件判断函数if/COALESCE/CASE/

在hdfs中为hive创建目录(/user/hive/warehouse)不成功,显示已经存在,但是找不到?求高手指点,先谢了

大数据仓库技术实训任务1

大数据仓库技术实训任务1