Hive作业

Posted cauwt

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive作业相关的知识,希望对你有一定的参考价值。

  最近学习Hive。实现了其中几个作业,如下。Hql语句没怎么调整格式,见谅。

 

Hive练习二中的题目

(1). 2017 年4 月1 日各个商品品牌的交易笔数,按照销售交易从多到少排序

1 select 
2 brand,
3 count(*) as totalCount 
4 from 
5 record 
6 join brand_dimension on record.bid = brand_dimension.bid 
7 where record.trancation_date= 2017-04-01 
8 group by brand_dimension.brand
9 order by totalCount desc;

输出

+------------+-------------+--+
|   brand    | totalcount  |
+------------+-------------+--+
| SAMSUNG    | 2           |
| WULIANGYE  | 1           |
| PUMA       | 1           |
| OPPO       | 1           |
| DELL       | 1           |
+------------+-------------+--+

 

(2). 不同性别消费的商品类别情况(不同性别消费不同商品类别的总价)

 1 select 
 2 gender,
 3 category, 
 4 sum(price) as totalPrice
 5 from record 
 6 join user_dimension on record.uid = user_dimension.uid
 7 join brand_dimension on record.bid = brand_dimension.bid
 8 group by gender, 
 9 category
10 order by 
11 gender,
12 category,
13 totalPrice;

输出

+---------+------------+-------------+--+
| gender  |  category  | totalprice  |
+---------+------------+-------------+--+
| M       | computer   | 252         |
| M       | food       | 429         |
| M       | sports     | 120         |
| M       | telephone  | 1669        |
+---------+------------+-------------+--+

 

Hive练习三中的题目

(1). 谁不是经理

1 select 
2 name 
3 from 
4 employees 
5 where 
6 size(subordinates)<=0;

输出

+-------------------+---------------+--+
|       name        | subordinates  |
+-------------------+---------------+--+
| Todd Jones        | []            |
| Bill King         | []            |
| Stacy Accountant  | []            |
+-------------------+---------------+--+
3 rows selected (0.092 seconds)

 

(2). 谁住在邮编比60500 大的地区

1 select name,address.zip from employees where address.zip> 60500;

输出

+-------------------+--------+--+
|       name        |  zip   |
+-------------------+--------+--+
| John Doe          | 60600  |
| Mary Smith        | 60601  |
| Todd Jones        | 60700  |
| Stacy Accountant  | 60563  |
+-------------------+--------+--+
4 rows selected (0.123 seconds)

 

(3). 联邦税超过0.15 的雇员

1 select 
2 name, 
3 deductions[‘Federal Taxes‘] 
4 from 
5 employees 
6 where deductions[‘Federal Taxes‘] > 0.15+1e-5;

输出

+---------------+---------------+--+
|     name      | federaltaxes  |
+---------------+---------------+--+
| John Doe      | 0.2           |
| Mary Smith    | 0.2           |
| Boss Man      | 0.3           |
| Fred Finance  | 0.3           |
+---------------+---------------+--+
4 rows selected (0.126 seconds)

 

(4). 谁住在Drive 或Par 街道

1 select 
2 name,
3 address 
4 from 
5 employees 
6 where address.street rlike ^.*(Drive|Par).*$;

输出

---------------+------------------------------------------------------------------------------+--+
|     name      |                                   address                                    |
+---------------+------------------------------------------------------------------------------+--+
| Boss Man      | {"street":"1 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500}  |
| Fred Finance  | {"street":"2 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500}  |
+---------------+------------------------------------------------------------------------------+--+
2 rows selected (0.218 seconds)

Hive练习十中的题目

1. 建表

创建并 使用 database tmall

1 create database tmall;
2 use tmall;

建表 product,格式 text

 1 create table if not exists product(
 2 item_id string,
 3 pic_url string,
 4 category string,
 5 brand_id string,
 6 seller_id string
 7 )
 8 row format delimited
 9 fields terminated by ,
10 lines terminated by \n
11 stored as textfile;

 

建表 review,格式 text

 1 create table if not exists review(
 2 item_id string,
 3 user_id string,
 4 feedback int,
 5 feedback_time timestamp,
 6 feedback_pic_url string
 7 )
 8 row format delimited
 9 fields terminated by ,
10 lines terminated by \n
11 stored as textfile;

 

建表 log_orc,格式 orc

 1 # log as textfile
 2 create table if not exists log(
 3 item_id string,
 4 user_id string,
 5 action string,
 6 action_time timestamp
 7 )
 8 row format delimited
 9 fields terminated by ,
10 lines terminated by \n
11 stored as textfile;
12 
13 # log as orc
14 create table if not exists log_orc
15 like log
16 stored as orc;

2. 载入数据

 1 #product
 2 load data local inpath /home/zkpk/test/tmall/tmall_product.csv overwrite into table product;
 3 #验证:select * from product limit 100;
 4 #review
 5 load data local inpath /home/zkpk/test/tmall/tmall_review.csv overwrite into table review;
 6 #验证:select * from review limit 100;
 7 #log
 8 load data local inpath /home/zkpk/test/tmall/tmall_log.csv overwrite into table log;
 9 #验证:select * from log limit 100;
10 insert into table log_orc select * from log;

3. 载入数据

  • 热度前十名的商家(商品被浏览的最多)
    1 select 
    2 p.seller_id, 
    3 count(*) as click_count 
    4 from log_orc l 
    5 inner join product p on l.item_id = p.item_id 
    6 where l.action=click 
    7 group by p.seller_id 
    8 order by click_count desc 
    9 limit 10;

    输出

    +--------------+--------------+--+
    | p.seller_id | click_count |
    +--------------+--------------+--+
    | s403 | 143 |
    | s190 | 116 |
    | s284 | 86 |
    | s161 | 78 |
    | s227 | 59 |
    | s61 | 59 |
    | s29 | 57 |
    | s82 | 45 |
    | s464 | 42 |
    | s261 | 42 |
    +--------------+--------------+--+

     

  • 好评率(feedback=5为好评)低于60%的商品
    1 select 
    2 item_id, 
    3 count(case when feedback =5 then 1 else null end) as best, 
    4 count(1) as total, 
    5 count(case when feedback =5 then 1 else null end)/ count(1) as best_rate 
    6 from review 
    7 group by item_id 
    8 having count(case when feedback =5 then 1 else null end) < count(1) *0.6 ;

    输出

    +----------+-------+--------+----------------------+--+
    | item_id  | best  | total  |      best_rate       |
    +----------+-------+--------+----------------------+--+
    | 221      | 0     | 1      | 0.0                  |
    | 256      | 0     | 4      | 0.0                  |
    | 287      | 1     | 3      | 0.3333333333333333   |
    | 288      | 2     | 6      | 0.3333333333333333   |
    | 378      | 0     | 1      | 0.0                  |
    | 379      | 1     | 6      | 0.16666666666666666  |
    | 397      | 0     | 2      | 0.0                  |
    | 398      | 0     | 1      | 0.0                  |
    | 423      | 2     | 7      | 0.2857142857142857   |
    | 449      | 1     | 5      | 0.2                  |
    | 45       | 0     | 6      | 0.0                  |
    | 450      | 0     | 1      | 0.0                  |
    | 451      | 0     | 3      | 0.0                  |
    | 453      | 1     | 2      | 0.5                  |
    | 505      | 1     | 2      | 0.5                  |
    +----------+-------+--------+----------------------+--+

     

  • 找出潜在购买用户(收藏了商品,但是没有购买)
     1 select distinct 
     2 m.item_id, 
     3 m.user_id 
     4 from log_orc m 
     5 left outer join log_orc s 
     6 on (m.item_id = s.item_id 
     7 and m.user_id = s.user_id 
     8 and s.action=alipay) 
     9 where m.action=collect 
    10 and s.item_id is null;

    输出

    +------------+------------+--+
    | m.item_id  | m.user_id  |
    +------------+------------+--+
    | 152        | 3286       |
    | 24         | 3389       |
    | 242        | 39         |
    | 422        | 3423       |
    | 468        | 2727       |
    +------------+------------+--+

     

以上是关于Hive作业的主要内容,如果未能解决你的问题,请参考以下文章

HTML5期末大作业:餐饮美食网站设计——咖啡(10页) HTML+CSS+JavaScript 学生DW网页设计作业成品 web课程设计网页规划与设计 咖啡网页设计 美食餐饮网页设计...(代码片段

Hive作业

如何获取Hive正在执行或者已结束的的MapReduce作业的SQL语句

在 HIVE 中运行查询时如何更改 Tez 作业名称

.csv 不是 Select Hive Query 上的 SequenceFile 错误

在 google dataproc 上执行 hive 作业时如何使用参数/属性标志值