Pig ERROR 0:Scalar在输出中有多行

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Pig ERROR 0:Scalar在输出中有多行相关的知识,希望对你有一定的参考价值。

我有两个文件,我试图在模式匹配的基础上加入这两个文件。

File1 :

weather.bbc.co.uk,112 
ads.facebook.com,113 
ads.amazon.co.uk,114 
www.sky.com,115 
news.bbc.co.uk,116 
pics.facebook.com,117

File2 :

facebook.com,facebook 
bbc.co.uk,bbc 
netflix.com,netflix 
flipkart.com,flipkart

output:

weather.bbc.co.uk,112,bbc.co.uk,bbc
ads.facebook.com,113,facebook.com,facebook
news.bbc.co.uk,116,bbc.co.uk,bbc
pics.facebook.com,117,facebook.com,facebook 

Script

file1 = LOAD '/file1' using PigStorage('|') as (request_domain: chararray,msisdn:int);       
file2 = LOAD '/file2' using PigStorage('|') as (domain: chararray,provider: chararray);
file3 = JOIN file1 by case when (request_domain MATCHES CONCAT(CONCAT('(?i).*',file2.domain),'.*')) then file2.domain  else 'Other' end LEFT OUTER,file2 by domain;
DESCRIBE file3;            
dump file3;

但是我收到如下错误:

WARN [Thread-29] org.apache.hadoop.mapred.LocalJobRunner - job_local_0006 org.apache.pig.backend.executionengine.ExecException:ERROR 0:标量在输出中有多行。第一名:(facebook.com,脸书),第二名:(bbc.co.uk,bbc)org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:111)org.apache.pig.backend org.apache上的org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextString(POUserFunc.java:432)中的.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)组织的org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:221)中的.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:317) .apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:275)at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextString(POUserFunc.java: 432)在org.apache.pi的org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:317) g.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:221)位于org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:275)at at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextString(POUserFunc.java:432)

答案

单独的应该是“,”而不是“|” - > PigStorage(',')

该模式将匹配多个值,尝试使用indexof udf的交叉函数,如下所示

down vote尝试使用cross,

file1 = LOAD 'data/file1.txt' using PigStorage(',') as (request_domain: chararray,msisdn:int);       
file2 = LOAD 'data/file2.txt' using PigStorage(',') as (domain: chararray,provider: chararray);
crossed = CROSS file1,file2;
filtered = FILTER crossed BY INDEXOF(file1::request_domain,file2::domain) != -1 ;

以上是关于Pig ERROR 0:Scalar在输出中有多行的主要内容,如果未能解决你的问题,请参考以下文章

PIG:标量在输出中有不止一行

pig latin - 从单行输入创建多行输出

Pig 0.7.0 错误 2118:无法在 Hadoop 1.2.1 上创建输入拆分

PIG 中的 GROUP 和 COGROUP 有啥区别?

我可以在 Apache Pig Latin 中将命令拆分为多行吗?

将 Pig 输出转换为 JSON 格式