Apache PIG - 如何获取 Flop 10 数据记录？

Posted 2023-04-18

技术标签:

【中文标题】Apache PIG - 如何获取 Flop 10 数据记录？【英文标题】：Apache PIG - How to get the Flop 10 data records? 【发布时间】：2015-05-01 10:50:25 【问题描述】：

我有这样的数据记录：

Name          customerID revenue(Mio) premium          
Michael James 078932832  2.7          y
Susan Miller  024383490  3.9          n
John Cooper   021023023  2.1          y

我如何获得记录 - 分为溢价标志 - 每个收入最低 (=翻牌 10)？

结果应为：

Nr Name          customerID revenue(Mio) premium          
1  John Cooper   021023023  2.1          y
2  Michael James 078932832  2.7          y
3  Andrew Murs   044834399  3.0          y
.  ...           .....      ...          .
10 th entry      with       flag         y

1  Susan Miller  024383490  3.9          n
.  ...           .....      ...          .
10 th entry      with       flag         n

如您所见，列表按升序排列（从最低收入开始）。

【问题讨论】：

你已经尝试了什么？ 【参考方案1】：

我猜你应该使用 split 考虑 A 是您的负载语句

A = load 'data' as (Nr,Name,customerID,revenue,premium);
B = split A into PRE if premium =='y', NONPRE if premium == 'n';
C = order PRE by revenue asc;
D = order NONPRE by revenue asc;

免责声明：使用拆分时要小心，因为空记录会被删除。这段代码我没有编译。

【讨论】：

非常感谢！我会测试它。但是如何在一个文件中获得两种不同的结果？# E = UNION C,D;转储 E;

以上是关于Apache PIG - 如何获取 Flop 10 数据记录？的主要内容，如果未能解决你的问题，请参考以下文章