大数据仓库技术实训任务1

Posted 2022-11-25 陈希瑞

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了大数据仓库技术实训任务1相关的知识，希望对你有一定的参考价值。

大数据仓库技术实训——任务1

一、分区表（6个结构化数据）

1、建立一张表t_all_hero_part，把6份文件同时映射加载(使用动态加载)

启用hive动态分区，需要在hive会话中设置两个参数：

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

insert into table t_all_hero_part partition(role) select tmp.*,tmp.role_main from t_all_hero tmp;

2、查询t_all_hero_part，验证是否映射成功

select * from t_all_hero_part;

3、查询role_main主要定位是射手并且hp_max最大生命大于6000的有几个

select count(*) from t_all_hero_part where role_main='support' and hp_max>6000;

二、高阶查询（shixun2.txt）

1、学生成绩进行一个默认排序

select * from shixun2 order by score;

2、对学生成绩进行降序

select * from shixun2 order by score desc;

3、用cluster by对id进行分组，reducestask值设为3

set mapreduce.job.reduces;
set mapreduce.job.reduces=3;
set mapreduce.job.reduces;

select * from student cluster by id;

4、利用distribute by+sort by把表按照性别分为两个组，每个组再按照年龄进行降序

select * from student distribute by sex sort by age desc;

三、insert练习（students.txt）

1、创建一个表并将其和students.txt映射成功

create table student(
id int,
name string,
sex string,
age int,
dept string
)
row format delimited 
fields terminated by ',';

load data local inpath "/root/hivedata/实训3/students_union.txt" into table darcy.students_union;

2、创建一个只有姓名和学号的学生表

create table student_insert_01(id int,name string);

3、利用insert+select将students.txt中的姓名和学号内容插入到新表中

insert into student_insert_01 select id,name from student;

4、查询新表

select * from student_insert_01;

5、创建两个表，两个表中只有一个字段分别为sex及age

create table student_insert_02(sex string);
create table student_insert_03(age int);

6、只进行一次扫描把students中内容插入进两张表

from student insert into table student_insert_02 select sex insert into table student_insert_03 select age;

7、查看该表中的信息

select * from student_insert_02;
select * from student_insert_03;

四、union联合查询（students.txt、students_union.txt）

1、创建两个表并映射成功

-- student表在前面已经创建了
create table students_union(
id int,
name string,
sex string,
age int,
dept string
)
row format delimited 
fields terminated by ',';

load data local inpath "/root/hivedata/实训3/students_union.txt" into table darcy.students_union;

2、使用union联合查询id,name

select id,name from student union select id,name from students_union;

3、从联合查询的内容中查询id,name(from子句中的子查询)

select id,name from (select id,name from student union select id,name from students_union) temp;

4、使用in查询student_union.txt中与student.txt中i相同的ID

select id from students_union where id in (select id from student);

5、使用not in查询student_union.txt中哪条数据是student.txt没有的ID

select a.id from students_union a where a.id not in (select id from student);

五、join(join_1.txt、join_2.txt)

1、内连接通过join_1、txt和join_2、txt查询id,name及add

select j1.id,j1.name,j2.address from join_1 j1,join_2 j2 where j1.id=j2.id;
select j1.id,j1.name,j2.address from join_1 j1 join join_2 j2 on j1.id=j2.id;

2、左链接，通过join_1.txt和join_2.txt查询id,name及address

select j_1.id,j_1.name,j_2.address from join_1 j_1 left join join_2 j_2 on j_1.id=j_2.id;

3、right join，通过join_1.txt和join_2.txt查询id,name及address

select j_1.id,j_1.name,j_2.address from join_1 j_1 right join join_2 j_2 on j_1.id=j_2.id;

4、full join，通过join_1.txt和join_2.txt查询id,name及add

select j_1.id,j_1.name,j_2.address from join_1 j_1 full join join_2 j_2 on j_1.id=j_2.id;

5、用join语句查询出student_union.txt中哪条数据是student.txt没有的

select * from students_union A left join student B on A.ID=B.ID where B.ID is null;

n_1 j_1 full join join_2 j_2 on j_1.id=j_2.id;


[外链图片转存中...(img-vGrf9cGm-1625062053092)]

### 5、用join语句查询出student_union.txt中哪条数据是student.txt没有的

```sql
select * from students_union A left join student B on A.ID=B.ID where B.ID is null;

以上是关于大数据仓库技术实训任务1的主要内容，如果未能解决你的问题，请参考以下文章