MYSQL/HIVESQL笔试题:HIVESQL
Posted 秋华
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了MYSQL/HIVESQL笔试题:HIVESQL相关的知识,希望对你有一定的参考价值。
4 手写HQL 第4题
已知一个表STG.ORDER,有如下字段:Date,Order_id,User_id,amount。请给出sql进行统计:数据样例:2017-01-01,10029028,1000003251,33.57。
1)给出 2017年每个月的订单数、用户数、总成交金额。
2)给出2017年11月的新客数(指在11月才有第一笔订单)
建表
create table order_tab(dt string,order_id string,user_id string,amount decimal(10,2)) row format delimited fields terminated by \'\\t\';
1)给出 2017年每个月的订单数、用户数、总成交金额。
select date_format(dt,\'yyyy-MM\'), count(order_id), count(distinct user_id), sum(amount) from order_tab where date_format(dt,\'yyyy\')=\'2017\' group by date_format(dt,\'yyyy-MM\');
2)给出2017年11月的新客数(指在11月才有第一笔订单)
select count(user_id) from order_tab group by user_id having date_format(min(dt),\'yyyy-MM\')=\'2017-11\';
5 手写HQL 第5题
有日志如下,请写出代码求得所有用户和活跃用户的总数及平均年龄。(活跃用户指连续两天都有访问记录的用户)日期 用户 年龄
数据集
2019-02-11,test_1,23 2019-02-11,test_2,19 2019-02-11,test_3,39 2019-02-11,test_1,23 2019-02-11,test_3,39 2019-02-11,test_1,23 2019-02-12,test_2,19 2019-02-13,test_1,23 2019-02-15,test_2,19 2019-02-16,test_2,19
1)建表
create table user_age(dt string,user_id string,age int)row format delimited fields terminated by \',\';
2)按照日期以及用户分组,按照日期排序并给出排名
select dt, user_id, min(age) age, rank() over(partition by user_id order by dt) rk from user_age group by dt,user_id;t1
3)计算日期及排名的差值
select user_id, age, date_sub(dt,rk) flag from t1;t2
4)过滤出差值大于等于2的,即为连续两天活跃的用户
select user_id, min(age) age from t2 group by user_id,flag having count(*)>=2;t3
5)对数据进行去重处理(一个用户可以在两个不同的时间点连续登录),例如:a用户在1月10号1月11号以及1月20号和1月21号4天登录。
select user_id, min(age) age from t3 group by user_id;t4
6)计算活跃用户(两天连续有访问)的人数以及平均年龄
select count(*) ct, cast(sum(age)/count(*) as decimal(10,2)) from t4;
7)对全量数据集进行按照用户去重
select user_id, min(age) age from user_age group by user_id;t5
8)计算所有用户的数量以及平均年龄
select count(*) user_count, cast((sum(age)/count(*)) as decimal(10,1)) from t5;
9)将第5步以及第7步两个数据集进行union all操作
select 0 user_total_count, 0 user_total_avg_age, count(*) twice_count, cast(sum(age)/count(*) as decimal(10,2)) twice_count_avg_age from ( select user_id, min(age) age from (select user_id, min(age) age from ( select user_id, age, date_sub(dt,rk) flag from ( select dt, user_id, min(age) age, rank() over(partition by user_id order by dt) rk from user_age group by dt,user_id )t1 )t2 group by user_id,flag having count(*)>=2)t3 group by user_id )t4 union all select count(*) user_total_count, cast((sum(age)/count(*)) as decimal(10,1)), 0 twice_count, 0 twice_count_avg_age from ( select user_id, min(age) age from user_age group by user_id )t5;t6
10)求和并拼接为最终SQL
select sum(user_total_count), sum(user_total_avg_age), sum(twice_count), sum(twice_count_avg_age) from (select 0 user_total_count, 0 user_total_avg_age, count(*) twice_count, cast(sum(age)/count(*) as decimal(10,2)) twice_count_avg_age from ( select user_id, min(age) age from (select user_id, min(age) age from ( select user_id, age, date_sub(dt,rk) flag from ( select dt, user_id, min(age) age, rank() over(partition by user_id order by dt) rk from user_age group by dt,user_id )t1 )t2 group by user_id,flag having count(*)>=2)t3 group by user_id )t4 union all select count(*) user_total_count, cast((sum(age)/count(*)) as decimal(10,1)), 0 twice_count, 0 twice_count_avg_age from ( select user_id, min(age) age from user_age group by user_id )t5)t6;
6 手写HQL 第6题
请用sql写出所有用户中在今年10月份第一次购买商品的金额,表ordertable字段(购买用户:userid,金额:money,购买时间:paymenttime(格式:2017-10-01),订单id:orderid)
1)建表
create table ordertable( userid string, money int, paymenttime string, orderid string) row format delimited fields terminated by \'\\t\';
2)查询出
select userid, min(paymenttime) paymenttime from ordertable where date_format(paymenttime,\'yyyy-MM\')=\'2017-10\' group by userid;t1
select t1.userid, t1.paymenttime, od.money from t1 join ordertable od on t1.userid=od.userid and t1.paymenttime=od.paymenttime;
select t1.userid, t1.paymenttime, od.money from (select userid, min(paymenttime) paymenttime from ordertable where date_format(paymenttime,\'yyyy-MM\')=\'2017-10\' group by userid)t1 join ordertable od on t1.userid=od.userid and t1.paymenttime=od.paymenttime;
以上是关于MYSQL/HIVESQL笔试题:HIVESQL的主要内容,如果未能解决你的问题,请参考以下文章