猪中按查询的内部组中的前 3 条记录
Posted
技术标签:
【中文标题】猪中按查询的内部组中的前 3 条记录【英文标题】:Top 3 records in inside group by query in pig 【发布时间】:2014-10-01 07:59:31 【问题描述】:我的问题的示例数据:
1 12 1234
2 12 1233
1 13 5555
1 15 4444
2 34 2222
7 89 1111
Field Description :
col1 cust_id ,col2 zip_code , col 3 transaction_id.
Using pig scripting i need to find the below question :
for each cust_id i need to find the zip code mostly used for last 3 transactions .
Approach I used so far :
1) Group records with cust_id :
(1,(1,12,1234),(1,13,5555),(1,15,4444),(1,12,3333),(1,13,2323),(1,13,3434),(1,13,5755),(1,18,4424),(1,12,3383),(1,13,2823))
(2,(2,34,2222),(2,12,1233),(2,34,6666),(2,34,6666),(2,34,2422))
(6,(6,14,2312),(6,15,8888),(6,14,4634),(6,14,2712),(6,15,8288))
(7,(7,45,4244),(7,89,1111),(7,45,4544),(7,89,1121))
2) 对它们进行排序并将它们限制在最近的 3 笔交易中。
Using nested foreach i have sorted by transaction id and limit that to 3
nested = foreach group_by sor = order zip by $2 desc ; limi = limit sor 3 ; generate limi; ;
After grouping data is :
((1,12,1234),(1,13,2323),(1,13,2823))
((2,12,1233),(2,34,2222),(2,34,2422))
((6,14,2312),(6,14,2712),(6,14,4634))
((7,89,1111),(7,89,1121),(7,45,4244))
为什么我上面的数据没有按降序排序?
即使是升序,现在我如何找到最近 3 笔交易中最常用的邮政编码。
Result should be
1) 13
2) 34
3) 14
4) 89
【问题讨论】:
【参考方案1】:你可以试试这个吗?
PigScript:
A = LOAD 'input.txt' USING PigStorage(',') AS(CustomerId:int,ZipCode:int,TransactionId:int);
B = GROUP A BY CustomerId;
C = FOREACH B
SortTxnId = ORDER A BY $2 DESC;
TxnIdLimit = LIMIT SortTxnId 3;
GENERATE group,TxnIdLimit;
D = FOREACH C GENERATE FLATTEN($1);
E = GROUP D BY ($0,$1);
F = FOREACH E GENERATE group,COUNT(D);
G = GROUP F BY group.$0;
I = FOREACH G
SortZipCode = ORDER F BY $1 DESC;
ZipCodeLimit = LIMIT SortZipCode 1;
GENERATE FLATTEN(ZipCodeLimit.group);
J = FOREACH I GENERATE FLATTEN($0.TxnIdLimit::ZipCode);
DUMP J;
Output:
(13)
(34)
(14)
(89)
input.txt
1,12,1234
1,13,5555
1,15,4444
1,12,3333
1,13,5755
1,18,4424
2,34,2222
2,12,1233
2,33,6666
2,34,6666
2,34,2422
6,14,2312
6,15,8888
6,14,4634
6,14,2712
7,45,4244
7,89,1111
7,89,3111
7,89,1121
【讨论】:
以上是关于猪中按查询的内部组中的前 3 条记录的主要内容,如果未能解决你的问题,请参考以下文章