获取猪的emp记录

Posted

技术标签:

【中文标题】获取猪的emp记录【英文标题】:Get emp record in pig 【发布时间】:2015-04-06 10:14:19 【问题描述】:
EMP_ID  PRD_NO  PRD_DATE               PRD_TOTAL PRD_NORM

IND235  00020   28/Mar/2015 02:00:50    11  60.00

IND235  00018   27/Mar/2015 03:10:40    7   60.00

IND235  00019   28/Mar/2015 04:00:54    3   60.00

IND235  00020   27/Mar/2015 05:00:51    11  60.00

PUR266  00044   28/Mar/2015 01:20:50    85  100.00

PUR266  00024   28/Mar/2015 06:30:60    33  100.00

PUR266  00017   27/Mar/2015 05:30:05    11  100.00

PUR266  00038   27/Mar/2015 02:30:15    60  100.00

I would expect to get the output:

IND235,27/Mar/2015,60,18,42

IND235,28/Mar/2015,60,14,46

PUR266,27/Mar/2015,100,71,29

PUR266,28/Mar/2015,100,118,-18

last col is PRD_NORM-PRD_TOTAL:

PRD_TOTAL sum by PRD_DATE,GROUP BY EMP_ID

我刚刚开始学习 Pig Latin 的来龙去脉——在 pig 或某个库中是否有内置的方法可以做到这一点,或者我应该考虑编写一个 UDF?

【问题讨论】:

【参考方案1】:

试试吧..

A = load 'pigdeduct' using PigStorage(' ') as (a1:chararray,b1:int,c1:chararray,d1:chararray,e1:int,f1:int);

B = foreach A GENERATE a1,c1,e1,f1; 

C = group B by (a1,c1);

D = foreach C generate FLATTEN(group),SUM(B.f1)/2,SUM(B.e1),SUM(B.f1)/2 - SUM(B.e1);

dump D;

输入文件:

IND235 00020 28/Mar/2015 02:00:50 11 60.00
IND235 00018 27/Mar/2015 03:10:40 7 60.00
IND235 00019 28/Mar/2015 04:00:54 3 60.00
IND235 00020 27/Mar/2015 05:00:51 11 60.00
PUR266 00044 28/Mar/2015 01:20:50 85 100.00
PUR266 00024 28/Mar/2015 06:30:60 33 100.00
PUR266 00017 27/Mar/2015 05:30:05 11 100.00
PUR266 00038 27/Mar/2015 02:30:15 60 100.00

输出:

(IND235,27/Mar/2015,60,18,42)
(IND235,28/Mar/2015,60,14,46)
(PUR266,27/Mar/2015,100,71,29)
(PUR266,28/Mar/2015,100,118,-18)

【讨论】:

以上是关于获取猪的emp记录的主要内容,如果未能解决你的问题,请参考以下文章

如何获取第一次记录后发生的任何更改值的emp id

牛客网SQL-第九题-获取所有非manager的员工emp_no

牛客网SQL-第九题-获取所有非manager的员工emp_no

牛客网SQL-第九题-获取所有非manager的员工emp_no

sql-获取当前薪水第二多的员工的emp_no以及其对应的薪水salary

如何使用 DB2 sql 检查不在两个表中的记录以获取另一个第三个表中的日期?