Pig SchemaParseException:无法重新定义:

Posted

技术标签:

【中文标题】Pig SchemaParseException:无法重新定义:【英文标题】:Pig SchemaParseException: Can't redefine: 【发布时间】:2014-08-08 10:38:25 【问题描述】:

我正在使用以 avro-1.7.4 格式存储的数据,并尝试将 Pig 用于数据操作。 当尝试加载数据然后再次存储它们时,我收到以下错误:

错误 2116: 输出位置验证失败:'file:///home/pig/100/test.avro 更多信息如下: 无法重新定义:员工

任何想法/建议将不胜感激。

谢谢。


更新:

Employees 字段位于架构的两个位置:

部分架构:



    
                       "name" : "Employees",
                        "type" : [ "null", 
                          "type" : "array",
                          "items" : 
                            "type" : "record",
                            "name" : "CheckResponsibleEmployee",
                            "fields" : [ 
                              "name" : "Id",
                              "type" : "string"
                            , 
                              "name" : "Name",
                              "type" : "string"
                            , 
                              "name" : "Job",
                              "type" : "Job"
                            , 
                              "name" : "Time",
                              "type" : [ "null", "Date" ],
                              "default" : null
                             ]
                          
                         ],
                        "default" : null
                      

在另一个地方(但我认为这还可以):



            
              "name" : "Employees",
              "type" : "ResponsibleEmployees"
            

我只是简单地运行脚本(加载库 piggybank、avro 1.7.4、mapred 等):



    data = LOAD 'part-m-00000.avro' USING AvroStorage();
    STORE data INTO 'output.avro' USING AvroStorage();


完整的堆栈跟踪

Pig Stack Trace --------------- ERROR 2116: Output Location Validation Failed for: 'file:///home/pig/100/test.avro More info to follow: Can't redefine: Employees org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias posdata at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1635) at org.apache.pig.PigServer.registerQuery(PigServer.java:575) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:541) at org.apache.pig.Main.main(Main.java:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2116: Output Location Validation Failed for: 'file:///home/pig/100/test.avro More info to follow: Can't redefine: Employees at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75) at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:300) at org.apache.pig.PigServer.compilePp(PigServer.java:1380) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1305) at org.apache.pig.PigServer.execute(PigServer.java:1297) at org.apache.pig.PigServer.access$400(PigServer.java:122) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1630) ... 13 more Caused by: org.apache.avro.SchemaParseException: Can't redefine: Employees at org.apache.avro.Schema$Names.put(Schema.java:1019) at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:496) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:611) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:799) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:633) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:620) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:799) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:633) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:620) at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:722) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:799) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:633) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:620) at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:722) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:799) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:633) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:620) at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:722) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:799) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:633) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:620) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:799) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:633) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:620) at org.apache.avro.Schema.toString(Schema.java:291) at org.apache.avro.Schema.toString(Schema.java:281) at org.apache.pig.builtin.AvroStorage.setOutputAvroSchema(AvroStorage.java:504) at org.apache.pig.builtin.AvroStorage.checkSchema(AvroStorage.java:495) at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:65) ... 25 more

【问题讨论】:

能否请您提供架构或关系以及您的代码? 您好 Gaurav,感谢您的关注。请查看我的更新。 您是否尝试将“员工”字段之一重命名为其他字段。从错误中可以看出,Pig 对在同一个模式中重用同一个字段不满意。试一试。 感谢您的建议。我今天已经解决了。我们不得不重新定义架构,因为正如您所说,“猪不满意”对引用其他类型记录的字段重复使用相同的名称。 【参考方案1】:

这是关于 PigLatinSchema 中字段名的歧义。我已经通过重新定义/更正 avro 架构来解决它,使其不包含引用不同类型记录的同名字段。

【讨论】:

如果 tro 字段具有相同类型的记录,我也会遇到此问题。似乎 Avro 无法理解这两个字段具有相同的架构并尝试生成它两次

以上是关于Pig SchemaParseException:无法重新定义:的主要内容,如果未能解决你的问题,请参考以下文章

Pig:使用 .pig_schema 模式文件加载数据

Pig系统分析-Pig可扩展性

Pig安装讲解

Pig - 使用 pig 加载 Word 文档(.doc 和 .docx)

Pig系统分析-Pig有用工具类

Pig系统分析-Pig有用工具类