hive学习04-员工部门表综合案例
Posted students
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了hive学习04-员工部门表综合案例相关的知识,希望对你有一定的参考价值。
知识点:
格式转换:cast(xxx as int)
按某列分桶某列排序,排序后打标机;例如:求每个地区工资最高的那个人的信息;
ROW_NUMBER() OVER(PARTITION BY COLUMN ORDER BY COLUMN)
row_number() over(distribute by t1.loc sort by cast(t1.sal as int) desc) as index
dept表
hive> select * from dept; # deptno(部门编号) dname(部门名称) loc(部门所在地区) 10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATIONS BOSTON
ump表
hive> select * from ump; # 员工编号 员工姓名 职务 领导编号 入职日期 工资 奖金 部门编号 # empno ename job mgr hiredate sal comm deptno 7369 SMITH CLERK 7902 1980-12-17 800.0 0.0 20 7499 ALLEN SALESMAN 7698 1981-02-20 1600.0 300.0 30 7521 WARD SALESMAN 7698 1981-02-22 1250.0 500.0 30 7566 JONES MANAGER 7839 1981-04-02 2975.0 0.0 20 7654 MARTIN SALESMAN 7698 1981-09-28 1250.0 1400.0 30 7698 BLAKE MANAGER 7839 1981-05-01 2850.0 0.0 30 7782 CLARK MANAGER 7839 1981-06-09 2450.0 0.0 10 7788 SCOTT ANALYST 7566 1987-07-13 3000.0 0.0 20 7839 KING PRESIDENT NULL 1981-11-07 5000.0 0.0 10 7844 TURNER SALESMAN 7698 1981-09-08 1500.0 0.0 30 7876 ADAMS CLERK 7788 1987-07-13 1100.0 0.0 20 7900 JAMES CLERK 7698 1981-12-03 950.0 0.0 30 7902 FORD ANALYST 7566 1981-12-03 3000.0 0.0 20 7934 MILLER CLERK 7782 1982-01-23 1300.0 0.0 10
(1) 查询总员工数
select count(empno) from ump; #Total MapReduce CPU Time Spent: 5 seconds 170 msec #OK #14
(2) 查询总共有多少种职位
select count(distinct job) from ump;
#Total MapReduce CPU Time Spent: 4 seconds 930 msec #OK #5
(3) 统计每个职位有多少个员工,并且按照数量从大到小排序
select job ,count (*)as emp_cnt from ump group by job order by emp_cnt desc; SALESMAN 4 CLERK 4 MANAGER 3 ANALYST 2 PRESIDENT 1
(4) 查询入职最早的员工
select ump.ename,ump.hiredate from ump join (select min(hiredate) as hiredate from ump)t1 where ump.hiredate=t1.hiredate; #SMITH 1980-12-17
(5) 统计出每个岗位的最高工资和平均工资
select job ,max(sal)as max_sale,avg(sal)as min_sale from ump group by job;
ANALYST 3000.0 3000.0
CLERK 950.0 1037.5
MANAGER 2975.0 2758.3333333333335
PRESIDENT 5000.0 5000.0
SALESMAN 1600.0 1400.0
(6) 查询出每个地区工资最高的员工
select t2.loc,t2.ename,t2.sal from (select t1.loc,t1.ename,t1.sal, row_number() over(distribute by t1.loc sort by cast(t1.sal as int) desc) as index from (select dept.loc,ump.ename,ump.sal from dept join ump on dept.deptno=ump.deptno)t1 )t2 where t2.index=1; #CHICAGO BLAKE 2850.0 #DALLAS FORD 3000.0 #NEW KING 5000.0
(7) 查询上半年入职员工最多的地区
select t1.loc,count(*)as cnt from (select dept.loc,ump.ename, cast(substr(ump.hiredate,6,2) as int) as hire_month from dept join ump on dept.deptno=ump.deptno)t1 where t1.hire_month<=6 group by t1.loc order by cnt desc limit 1; CHICAGO 3
以上是关于hive学习04-员工部门表综合案例的主要内容,如果未能解决你的问题,请参考以下文章