pig中“无法打开别名的迭代器”是啥意思?

Posted

技术标签:

【中文标题】pig中“无法打开别名的迭代器”是啥意思?【英文标题】:What is the meaning of "unable to open iterator for an alias" in pig?pig中“无法打开别名的迭代器”是什么意思? 【发布时间】:2016-10-26 19:21:40 【问题描述】:

我试图使用联合运算符,如下所示

uni_b = UNION A, B, C, D, E, F, G, H;

这里所有的关系 A,B,C...H 都具有相同的架构

当我使用转储运算符时,直到 85% 运行良好.. 之后它显示以下错误..

ERROR 1066: Unable to open iterator for alias uni_b

这是什么?哪里有问题?我应该如何调试?

这是我的猪脚本...

ip = load '/jee/jee_data.txt' USING PigStorage(',') as (id:Biginteger, fname:chararray , lname:chararray , board:chararray , eid:chararray , gender:chararray , math:double , phy:double , chem:double , jeem:double , jeep:double , jeec:double ,cat:chararray , dob:chararray);

todate_ip = foreach ip generate id, fname , lname , board , eid , gender   , math , phy , chem , jeem , jeep , jeec , cat , ToDate(dob,'dd/MM/yyyy') as dob;

jnbresult1 = foreach todate_ip generate id, fname , lname , board , eid , gender , math , phy , chem , jeem , jeep , jeec, ROUND_TO(AVG(TOBAG( math , phy , chem )),3) as bresult, ROUND_TO(SUM(TOBAG(jeem , jeep , jeec )),3) as jresult , cat , dob;

rankjnbres = rank jnbresult1 by jresult DESC , bresult DESC , jeem DESC, math DESC, jeep DESC, phy DESC, jeec DESC, chem DESC, gender ASC, dob ASC, fname ASC, lname ASC DENSE;

rankjnbres1 = rank jnbresult1 by bresult DESC , jeem DESC, math DESC, jeep DESC, phy DESC, jeec DESC, chem DESC, gender ASC, dob ASC, fname ASC, lname ASC DENSE;

allper = foreach rankjnbres generate id, rank_jnbresult1 , fname , lname , board , eid , gender , math , phy , chem , jeem , jeep , jeec, bresult, jresult , cat , dob , ROUND_TO(((double)((10000-rank_jnbresult1)/100.000)),3) as aper;

allper1 = foreach rankjnbres1 generate id, rank_jnbresult1 , fname , lname , board , eid , gender , math , phy , chem , jeem , jeep , jeec, bresult, jresult , cat , dob , ROUND_TO(((double)((10000-rank_jnbresult1)/100.000)),3) as a1per;

SPLIT allper into cbseB if board=='CBSE', anbB if board=='Andhra Pradesh', apB if board=='Arunachal Pradesh', bhB if board=='Bihar', gjB if board=='Gujarat' , jnkB if board=='Jammu and Kashmir', mpB if board=='Madhya Pradesh', mhB if board=='Maharashtra',  rjB if board=='Rajasthan' ,  ngB if board=='Nagaland' ,  tnB if board=='Tamil Nadu' , wbB if board=='West Bengal' ,  upB if board=='Uttar Pradesh';

rankcbseB = rank cbseB by jresult DESC , bresult DESC , jeem DESC, math DESC, jeep DESC, phy DESC, jeec DESC, chem DESC, gender ASC, dob ASC, fname ASC, lname ASC DENSE;

grp = group rankcbseB all;

maxno = foreach grp generate MAX(rankcbseB.rank_cbseB) as max1;

cbseper = foreach rankcbseB generate id, rank_cbseB , fname , lname , board , eid , gender , math , phy , chem , jeem , jeep , jeec, bresult, jresult , cat , dob , ROUND_TO(((double)((maxno.max1-rank_cbseB)*100.000/maxno.max1)),3) as per , aper;

rankBcbseB = rank cbseB by bresult DESC , jeem DESC, math DESC, jeep DESC, phy DESC, jeec DESC, chem DESC, gender ASC, dob ASC, fname ASC, lname ASC DENSE;

grp = group rankBcbseB all;

maxno = foreach grp generate MAX(rankBcbseB.rank_cbseB) as max1;

Bcbseper = foreach rankBcbseB generate id, rank_cbseB , fname , lname , board , eid , gender , math , phy , chem , jeem , jeep , jeec, bresult, jresult , cat , dob , ROUND_TO(((double)((maxno.max1-rank_cbseB)*100.000/maxno.max1)),3) as bper , aper;

rankanbB = rank anbB by jresult DESC , bresult DESC , jeem DESC, math DESC, jeep DESC, phy DESC, jeec DESC, chem DESC, gender ASC, dob ASC, fname ASC, lname ASC DENSE;

grp = group rankanbB all;

maxno = foreach grp generate MAX(rankanbB.rank_anbB) as max1;

anbper = foreach rankanbB generate id, rank_anbB , fname , lname , board , eid , gender , math , phy , chem , jeem , jeep , jeec, bresult,jresult , cat , dob , ROUND_TO(((double)((maxno.max1-rank_anbB)*100.000/maxno.max1)),3) as per , aper;

rankBanbB = rank anbB by bresult DESC , jeem DESC, math DESC, jeep DESC, phy DESC, jeec DESC, chem DESC, gender ASC, dob ASC, fname ASC, lname ASC DENSE;

grp = group rankBanbB all;

maxno = foreach grp generate MAX(rankBanbB.rank_anbB) as max1;

Banbper = foreach rankanbB generate id, rank_anbB , fname , lname , board , eid , gender , math , phy , chem , jeem , jeep , jeec, bresult, jresult , cat , dob , ROUND_TO(((double)((maxno.max1-rank_anbB)*100.000/maxno.max1)),3) as bper , aper;

joinall = join cbseper by (per) , Bcbseper by (bper) ;

joinall = foreach joinall generate Bcbseper::id as id,cbseper::jresult as b1;

A = cross Bcbseper , allper;

A1 = foreach A generate Bcbseper::id as id,Bcbseper::rank_cbseB as rank,Bcbseper::fname as fname,Bcbseper::lname as lname,Bcbseper::board as board,Bcbseper::eid as eid ,Bcbseper::gender as gender, Bcbseper::bresult as bresult,Bcbseper::jresult as jresult,Bcbseper::cat as cat,Bcbseper::dob as dob,Bcbseper::bper as bper,Bcbseper::aper as aper,allper::jresult as b2,allper::aper as a1per;

B = filter A1 by bper > a1per;

C = group B by id;

Dcbse = foreach C 
E = order B by a1per DESC;
F = limit E 1;
generate FLATTEN(F.id) , FLATTEN(F.b2);
;

joincbse = join joinall by id , Dcbse by id;

joincbse = foreach joincbse generate joinall::id as id , joinall::b1 as b1, Dcbse::null::b2 as b2;

joinall = join anbper by (per) , Banbper by (bper) ;

joinall = foreach joinall generate Banbper::id as id,anbper::jresult as b1;

A = cross Banbper , allper;

A1 = foreach A generate Banbper::id as id,Banbper::rank_anbB as rank,Banbper::fname as fname,Banbper::lname as lname,Banbper::board as board,Banbper::eid as eid ,Banbper::gender as gender, Banbper::bresult as bresult,Banbper::jresult as jresult,Banbper::cat as cat,Banbper::dob as dob,Banbper::bper as bper,Banbper::aper as aper,allper::jresult as b2,allper::aper as a1per;

B = filter A1 by bper > a1per;

C = group B by id;

Danb = foreach C 
E = order B by a1per DESC;
F = limit E 1;
generate FLATTEN(F.id) , FLATTEN(F.b2);
;

joinanb = join joinall by id , Danb by id;

joinanb = foreach joinanb generate joinall::id as id , joinall::b1 as b1, Danb::null::b2 as b2;

uni_b = UNION joincbse , joinanb ;

【问题讨论】:

该错误可能有多种原因。请发布您的整个猪脚本和示例数据/输出 请将此简化为一个最小的工作示例。你能联合 A、B、A、B、C 和 A、B、C、D 等吗? A,A,A,A...呢? 请发布猪脚本和更多错误跟踪 如果关于 uni_b 的错误是唯一返回的错误,我会感到惊讶。但是,要验证这是问题所在,请描述输入并将其存储到 uni_b(我建议使用 schema 的 pigstorage),然后在新的 MINIMAL 脚本中加载它们并合并它们。这应该清楚错误发生的位置。 【参考方案1】:

我找到了解决方案。所做的是以下...

首先我存储了所有的关系A,B,C,....使用存储操作如下

STORE A into into '/opA/' using PigStorage(',');

然后,我使用如下加载操作加载了所有关系的输入

ipA = load '/opA/part-r-00000' USING PigStorage (',') as (id:Biginteger, b1: double, b2: double);

最后我使用联合操作进行联合,如下所示

uni_b = UNION ipA ,ipB ,ipC , ipD ,ipE ;

我得到了答案,没有任何错误。

【讨论】:

以上是关于pig中“无法打开别名的迭代器”是啥意思?的主要内容,如果未能解决你的问题,请参考以下文章

错误1066:无法在Pig,Generic解决方案中打开别名的迭代器

错误 1066:无法打开别名的迭代器 - PIG SCRIPT

Java中的Pig UDF:错误---错误1066:无法打开别名的迭代器

Pig中的双冒号到底是啥意思?

Netstat -ab 中的“无法获取所有权信息”是啥意思?

Netstat -ab 中的“无法获取所有权信息”是啥意思?