大数据仓库技术实训任务1
Posted 陈希瑞
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据仓库技术实训任务1相关的知识,希望对你有一定的参考价值。
大数据仓库技术实训——任务1
一、分区表(6个结构化数据)
1、建立一张表t_all_hero_part,把6份文件同时映射加载(使用动态加载)
启用hive动态分区,需要在hive会话中设置两个参数:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table t_all_hero_part partition(role) select tmp.*,tmp.role_main from t_all_hero tmp;
2、查询t_all_hero_part,验证是否映射成功
select * from t_all_hero_part;
3、查询role_main主要定位是射手并且hp_max最大生命大于6000的有几个
select count(*) from t_all_hero_part where role_main='support' and hp_max>6000;
二、高阶查询(shixun2.txt)
1、 学生成绩进行一个默认排序
select * from shixun2 order by score;
2、对学生成绩进行降序
select * from shixun2 order by score desc;
3、用cluster by对id进行分组,reducestask值设为3
set mapreduce.job.reduces;
set mapreduce.job.reduces=3;
set mapreduce.job.reduces;
select * from student cluster by id;
4、利用distribute by+sort by把表按照性别分为两个组,每个组再按照年龄进行降序
select * from student distribute by sex sort by age desc;
三、insert练习(students.txt)
1、创建一个表并将其和students.txt映射成功
create table student(
id int,
name string,
sex string,
age int,
dept string
)
row format delimited
fields terminated by ',';
load data local inpath "/root/hivedata/实训3/students_union.txt" into table darcy.students_union;
2、 创建一个只有姓名和学号的学生表
create table student_insert_01(id int,name string);
3、 利用insert+select将students.txt中的姓名和学号内容插入到新表中
insert into student_insert_01 select id,name from student;
4、查询新表
select * from student_insert_01;
5、 创建两个表,两个表中只有一个字段分别为sex及age
create table student_insert_02(sex string);
create table student_insert_03(age int);
6、只进行一次扫描把students中内容插入进两张表
from student insert into table student_insert_02 select sex insert into table student_insert_03 select age;
7、查看该表中的信息
select * from student_insert_02;
select * from student_insert_03;
四、union联合查询(students.txt、students_union.txt)
1、创建两个表并映射成功
-- student表在前面已经创建了
create table students_union(
id int,
name string,
sex string,
age int,
dept string
)
row format delimited
fields terminated by ',';
load data local inpath "/root/hivedata/实训3/students_union.txt" into table darcy.students_union;
2、使用union联合查询id,name
select id,name from student union select id,name from students_union;
3、从联合查询的内容中查询id,name(from子句中的子查询)
select id,name from (select id,name from student union select id,name from students_union) temp;
4、使用in查询student_union.txt中与student.txt中i相同的ID
select id from students_union where id in (select id from student);
5、使用not in查询student_union.txt中哪条数据是student.txt没有的ID
select a.id from students_union a where a.id not in (select id from student);
五、join(join_1.txt、join_2.txt)
1、内连接通过join_1、txt和join_2、txt查询id,name及add
select j1.id,j1.name,j2.address from join_1 j1,join_2 j2 where j1.id=j2.id;
select j1.id,j1.name,j2.address from join_1 j1 join join_2 j2 on j1.id=j2.id;
2、左链接,通过join_1.txt和join_2.txt查询id,name及address
select j_1.id,j_1.name,j_2.address from join_1 j_1 left join join_2 j_2 on j_1.id=j_2.id;
3、right join,通过join_1.txt和join_2.txt查询id,name及address
select j_1.id,j_1.name,j_2.address from join_1 j_1 right join join_2 j_2 on j_1.id=j_2.id;
4、full join,通过join_1.txt和join_2.txt查询id,name及add
select j_1.id,j_1.name,j_2.address from join_1 j_1 full join join_2 j_2 on j_1.id=j_2.id;
5、用join语句查询出student_union.txt中哪条数据是student.txt没有的
select * from students_union A left join student B on A.ID=B.ID where B.ID is null;
n_1 j_1 full join join_2 j_2 on j_1.id=j_2.id;
[外链图片转存中...(img-vGrf9cGm-1625062053092)]
### 5、用join语句查询出student_union.txt中哪条数据是student.txt没有的
```sql
select * from students_union A left join student B on A.ID=B.ID where B.ID is null;
以上是关于大数据仓库技术实训任务1的主要内容,如果未能解决你的问题,请参考以下文章