SQL-HIVE-PIG-Mapreduce
Posted
技术标签:
【中文标题】SQL-HIVE-PIG-Mapreduce【英文标题】:SQL-HIVE-PIG -Mapreduce 【发布时间】:2014-12-18 14:45:44 【问题描述】:每行有 5 列,这 5 列通常用逗号分隔
1 column is name
2nd column is date_of_purchase
3rd column is product
4th column is mode of payment
5th column is total_amount
希望您了解其中包含哪些数据
surender,2014-03-09,TV,OFFLINE,20000
surender,2014-01-01,Mobile,ONLINE,18000
Raja,2014-09-21,Laptop,ONLINE,30000
Surender,2014-10-12,Laptop,ONLINE,40000
Raja,2014-FEB-11,MusicSystem,ONLINE,2000
Kumar,2014-07-09,Ipod,OFFLINE,4000
Kumar,2014-06-08,TV,ONLINE,20000
Raja,2014-11-07,SPeakers,OFFLINE,8000
Kumar,2014-10-18,Laptop,ONLINE,30000
我需要的是我想看看每个人通过在线模式和离线模式花了多少钱
基本上我需要减速器的输出应该如下所示
surender OFFLINE 20000
surender ONLINE 58000
Raja OFFLINE 8000
Raja ONLINE 32000
Kumar OFFLINE 4000
Kumar ONLINE 50000
最终的输出应该是这样的:
surender 20000 58000
Raja 8000 32000
Kumar 4000 50000
你可以给我一个 hive 或 pig 查询或者一个 mapreduce 程序
【问题讨论】:
【参考方案1】:A = LOAD 'file_name' using PigStorage(',') as (name:chararray,date:chararray,product:chararray,mode:chararray,total:long);
B = GROUP A BY (name,mode);
C = FOREACH B GENERATE group.name as name,group.mode, SUM(total) as total;
D = GROUP C BY name;
E = FOREACH D GENERATE group, C.total;
如果您提供的样本等数据有不同的拼写,那么您需要在分组之前转换为大写
【讨论】:
@AnuragSingh,请接受能回答您问题的答案!以上是关于SQL-HIVE-PIG-Mapreduce的主要内容,如果未能解决你的问题,请参考以下文章