SQL-HIVE-PIG-Mapreduce

Posted

技术标签:

【中文标题】SQL-HIVE-PIG-Mapreduce【英文标题】:SQL-HIVE-PIG -Mapreduce 【发布时间】:2014-12-18 14:45:44 【问题描述】:

每行有 5 列,这 5 列通常用逗号分隔

1 column is name
2nd column is date_of_purchase
3rd column is product
4th column is mode of payment
5th column is total_amount

希望您了解其中包含哪些数据

surender,2014-03-09,TV,OFFLINE,20000
surender,2014-01-01,Mobile,ONLINE,18000
Raja,2014-09-21,Laptop,ONLINE,30000
Surender,2014-10-12,Laptop,ONLINE,40000
Raja,2014-FEB-11,MusicSystem,ONLINE,2000
Kumar,2014-07-09,Ipod,OFFLINE,4000
Kumar,2014-06-08,TV,ONLINE,20000
Raja,2014-11-07,SPeakers,OFFLINE,8000
Kumar,2014-10-18,Laptop,ONLINE,30000

我需要的是我想看看每个人通过在线模式和离线模式花了多少钱

基本上我需要减速器的输出应该如下所示

surender   OFFLINE   20000
surender   ONLINE    58000
Raja       OFFLINE   8000
Raja       ONLINE    32000
Kumar      OFFLINE    4000
Kumar      ONLINE    50000

最终的输出应该是这样的:

surender 20000  58000
Raja     8000   32000
Kumar     4000   50000 

你可以给我一个 hive 或 pig 查询或者一个 mapreduce 程序

【问题讨论】:

【参考方案1】:
A = LOAD 'file_name' using PigStorage(',') as (name:chararray,date:chararray,product:chararray,mode:chararray,total:long);
B = GROUP A BY (name,mode);
C = FOREACH B GENERATE group.name as name,group.mode, SUM(total) as total;
D = GROUP C BY name;
E = FOREACH D GENERATE group, C.total;

如果您提供的样本等数据有不同的拼写,那么您需要在分组之前转换为大写

【讨论】:

@AnuragSingh,请接受能回答您问题的答案!

以上是关于SQL-HIVE-PIG-Mapreduce的主要内容,如果未能解决你的问题,请参考以下文章