迭代连接集后的 PIG 错误 1066。
Posted
技术标签:
【中文标题】迭代连接集后的 PIG 错误 1066。【英文标题】:PIG Error 1066 after iterating through a joined set. 【发布时间】:2011-07-29 18:15:17 【问题描述】:尝试加入一个集合,该集合具有该月的天数,该集合具有年月键上的数据集。在我加入并尝试对集合执行 FOREACH 后,我得到一个 ERROR: 1066 ... Backend error : Scalar has more than one row in the output.
这是一个有同样问题的缩略集:
$ hadoop fs -cat DIM/\*
2011,01,31
2011,02,28
2011,03,31
2011,04,30
2011,05,31
2011,06,30
2011,07,31
2011,08,31
2011,09,30
2011,10,31
2011,11,30
2011,12,31
$ hadoop fs -cat ACCT/\*
2011,7,26,key1,23.25,2470.0
2011,7,26,key2,10.416666666666668,232274.08333333334
2011,7,26,key3,82.83333333333333,541377.25
2011,7,26,key4,78.5,492823.33333333326
2011,7,26,key5,110.83333333333334,729811.9166666667
2011,7,26,key6,102.16666666666666,675941.25
2011,7,26,key7,118.91666666666666,770896.75
然后在咕噜声中:
grunt> DIM = LOAD 'DIM' USING PigStorage(',') AS (year:int, month:int, days:int);
grunt> ACCT = LOAD 'ACCT' USING PigStorage(',') AS (year:int, month:int, day: int, account:chararray, metric1:double, metric2:double);
grunt> AjD = JOIN ACCT BY (year,month), DIM BY (year,month) USING 'replicated';
grunt> dump AjD;
...
(2011,7,26,key1,23.25,2470.0,2011,7,31)
(2011,7,26,key2,10.416666666666668,232274.08333333334,2011,7,31)
(2011,7,26,key3,82.83333333333333,541377.25,2011,7,31)
(2011,7,26,key4,78.5,492823.33333333326,2011,7,31)
(2011,7,26,key5,110.83333333333334,729811.9166666667,2011,7,31)
(2011,7,26,key6,102.16666666666666,675941.25,2011,7,31)
(2011,7,26,key7,118.91666666666666,770896.75,2011,7,31)
grunt> describe AjD;
AjD: ACCT::year: int,ACCT::month: int,ACCT::day: int,ACCT::account: chararray,ACCT::metric1: double,ACCT::metric2: double,DIM::year: int,DIM::month: int,DIM::days: int
grunt> FINAL = FOREACH AjD
>> GENERATE ACCT.year, ACCT.month, ACCT.account, (ACCT.metric2 / DIM.days);
grunt> dump FINAL;
...
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias FINAL. Backend error : Scalar has more than one row in the output. 1st : (2011,7,26,key1,23.25,2470.0), 2nd :(2011,7,26,key2,10.416666666666668,232274.08333333334)
但是,如果我存储它并重新加载它以摆脱它的“加入”模式:
grunt> STORE AjD INTO 'AjD' using PigStorage(',');
grunt> AjD2 = LOAD 'AjD' USING PigStorage(',') AS (year:int, month:int, day:int, account:chararray, metric1:double, metric2:double, year2:int, month2:int, days:int);
grunt> FINAL = FOREACH AjD2
>> GENERATE year, month, account, (metric2 /days);
grunt> dump FINAL;
...
(2011,7,key1,79.6774193548387)
(2011,7,key2,7492.712365591398)
(2011,7,key3,17463.782258064515)
(2011,7,key4,15897.526881720427)
(2011,7,key5,23542.319892473122)
(2011,7,key6,21804.5564516129)
(2011,7,key7,24867.637096774193)
有没有一种方法可以在不存储和重新加载的情况下对连接集进行迭代 (FOREACH)?
【问题讨论】:
对于在寻找ERROR 1066: Unable to open iterator for alias 时发现此帖子的人,这里是generic solution。 【参考方案1】:您是否尝试过使用:: Operator 指定要获取哪一列?
将(ACCT.metric2 / DIM.days)
替换为(ACCT::metric2 / DIM::days)
。
例如
...
FINAL = FOREACH AjD
GENERATE
ACCT.year, ACCT.month, ACCT.account,(ACCT::metric2 / DIM::days);
【讨论】:
所有列限定符都必须是'::' 感谢您的回答。添加指向此related question 的链接。 @shoover 刚刚通过链接和反向链接这两个问题创建了一个无限循环。 :)以上是关于迭代连接集后的 PIG 错误 1066。的主要内容,如果未能解决你的问题,请参考以下文章
错误1066:无法在Pig,Generic解决方案中打开别名的迭代器
错误 1066:无法打开别名的迭代器 - PIG SCRIPT
Java中的Pig UDF:错误---错误1066:无法打开别名的迭代器