Apache Pig:在 Pig 中处理数据类型时面临问题

Posted

技术标签:

【中文标题】Apache Pig:在 Pig 中处理数据类型时面临问题【英文标题】:Apache Pig: Facing issue while handling datatype in Pig 【发布时间】:2019-01-31 14:56:01 【问题描述】:

我在处理字段 qty 的数据类型并在同一字段上执行 SUM 时遇到了问题。下面是代码。我将qty 转换为double,但仍然出现下面提到的错误。有人可以帮我理解这个问题,如果可能的话,有一个解决方案吗?

A_test1 = load'EXT_OO_IMP' USING PigStorage('\u0001') AS (it: chararray,loc: chararray,qty: chararray,scheddate: chararray,udc_cta_no: chararray,udc_imp_pack_qty: chararray,udc_imp_ready_dt: chararray,udc_imp_ref_no: chararray,udc_ord_sys_cd: chararray,udc_source: chararray,udc_sply_typ: chararray,udc_vend_pack_id: chararray,udc_purch_stg: chararray,srs_pack_flow_indicator_cd: chararray,it_type_cd: chararray,source_owner_cd: chararray,nks_id: chararray,alloc_replen_cd: chararray);

----- ext_oo_import: it: chararray,loc: chararray,qty: chararray,scheddate: chararray,udc_cta_no: chararray,udc_imp_pack_qty: chararray,udc_imp_ready_dt: chararray,udc_imp_ref_no: chararray,udc_ord_sys_cd: chararray,udc_source: chararray,udc_sply_typ: chararray,udc_vend_pack_id: chararray,udc_purch_stg: chararray,srs_pack_flow_indicator_cd: chararray,it_type_cd: chararray,source_owner_cd: chararray,nks_id: chararray,alloc_replen_cd: chararray

----- ##############  ##############  ##############

import_on_order = 
        FOREACH A_test1
        GENERATE
            loc,
            it,
            nks_id,
            (double)(qty is NULL ? 0 : qty) as qty:double,
            scheddate,
            ' ' AS order_source,
            ' ' AS chs_it_type_cd;

describe import_on_order;

----- import_on_order: loc: chararray,it: chararray,nks_id: chararray,qty: int,scheddate: chararray,order_source: chararray,chs_it_type_cd: chararray


grp_import_on_order = GROUP import_on_order BY (loc,it,nks_id,scheddate,order_source,chs_it_type_cd);


describe grp_import_on_order;

----- grp_import_on_order: group: (loc: chararray,it: chararray,nks_id: chararray,scheddate: chararray,order_source: chararray,chs_it_type_cd: chararray),import_on_order: (loc: chararray,it: chararray,nks_id: chararray,qty: int,scheddate: chararray,order_source: chararray,chs_it_type_cd: chararray)



------------------------------- STORE TO FILE ---------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------

work__idrp_import_on_order =
                            FOREACH grp_import_on_order 
                            GENERATE    group.loc AS loc,
                                        group.it AS it,
                                        group.nks_id AS nks_id,
                                        SUM(import_on_order.qty) AS qty,
                                        group.scheddate AS scheddate,
                                        group.order_source AS order_source,
                                        group.chs_it_type_cd AS chs_it_type_cd;

describe work__idrp_import_on_order;

----- work__idrp_import_on_order: loc: chararray,it: chararray,nks_id: chararray,qty: int,scheddate: chararray,order_source: chararray,chs_it_type_cd: chararray

import_on_order_rp = 
        FOREACH ext_oo_import
        GENERATE
            it AS chs_it,
            loc AS chs_loc,
            (qty is NULL ? 0 : qty) as qty:double,
            scheddate AS current_due_dt, 
            ' ' AS order_source,
            'V' AS source_type_cd,
            udc_sply_typ AS sply_typ,
            udc_ord_sys_cd AS ord_sys_cd;

2019-01-31 09:03:30,819 [main] 错误 org.apache.pig.tools.grunt.GruntParser - 错误 0:执行时出现异常(名称:grp_import_on_order:本地重排[元组]元组(假) - scope-1095 Operator Key: scope-1095): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while execution (Name: work__idrp_import_on_order: New For Each(false,false)[bag] - scope- 1078 Operator Key: scope-1078): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while execution (Name: Pre Combiner Local Rearrange[tuple] Unknown - scope-1097 Operator Key: scope-1097 ):org.apache.pig.backend.executionengine.ExecException:错误0:执行时出现异常(名称:import_on_order:New For Each(false,false,false,false,false,false,false)[bag] - scope-977运算符键:scope-977):org.apache.pig.backend.executionengine.ExecException:错误 0:执行时出现异常(名称:ext_oo_import:New For Each(false,false,false,false,false)[bag] - 范围-957 操作员键:范围-957): org.apache.pig.backend.executionengine.ExecException: ERROR 0: 执行时异常(名称:New For Each(false,false,false,false,false)[bag] - scope-945 Operator Key: scope -945):org.apache.pig.backend.executionengine.ExecException:错误0:执行[POCast(名称:Cast [double] - scope-926 Operator Key:scope-926)时出现异常:[[POProject(名称: Project[chararray][2] - scope-925 Operator Key: scope-925) children: null at []]] at []]: java.lang.ClassCastException: java.lang.Integer 无法转换为 java.lang。细绳 日志文件中的详细信息:/logs/hdidrp/pig/pig_1548942743751.log 2019-01-31 09:03:30,849 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name 已弃用。相反,使用 fs.defaultFS 2019-01-31 09:03:31,012 [main] WARN org.apache.pig.PigServer - 遇到警告 IMPLICIT_CAST_TO_DOUBLE 1 次。 import_on_order_rp:shc_item:chararray,shc_loc:chararray,数量:double,current_due_dt:chararray,order_source:chararray,source_type_cd:chararray,sply_typ:chararray,ord_sys_cd:chararray 2019-01-31 09:03:31,179 [main] 错误 org.apache.pig.tools.grunt.GruntParser - 错误 0:执行时出现异常(名称:grp_import_on_order:本地重排[元组]元组(假)-范围-1095 Operator Key: scope-1095): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while execution (Name: work__idrp_import_on_order: New For Each(false,false)[bag] - scope-1078 Operator Key :scope-1078):org.apache.pig.backend.executionengine.ExecException:错误0:执行时出现异常(名称:Pre Combiner Local Rearrange [tuple] Unknown - scope-1097 Operator Key:scope-1097):org .apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while execution (Name: import_on_order: New For Each(false,false,false,false,false,false,false)[bag] - scope-977 Operator Key: scope-977):org.apache.pig.backend.executionengine.ExecException:错误 0:执行时出现异常(名称:ext_oo_import:New For Each(false,false,false,false,false)[bag] - scope-957 Operator键:scope-95 7): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while execution (Name: New For Each(false,false,false,false,false)[bag] - scope-945 Operator Key: scope- 945):org.apache.pig.backend.executionengine.ExecException:错误0:执行[POCast(名称:Cast [double] - scope-926 Operator Key:scope-926)时出现异常:[[POProject(名称:Project [chararray][2] - scope-925 Operator Key: scope-925) children: null at []]] at []]: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String

【问题讨论】:

【参考方案1】:

查看代码后,无法进入第一个语句,您正在加载数据并执行转换步骤,但在最后一个语句中,为什么您要再次转换第一个字符串类型的数据集,并且在处理此操作时会给出异常。

import_on_order_rp = FOREACH ext_oo_import 将其生成为 chs_it, loc AS chs_loc, (qty is NULL ? 0 : qty) as qty:double, scheddate AS current_due_dt, '' AS order_source, 'V' AS source_type_cd, udc_sply_typ AS sply_typ, udc_ord_sys_cd AS ord_sys_cd;

看看这是否正确。

【讨论】:

[[POProject (Name: Project[chararray][2] - scope-925 Operator Key: scope-925) children: null at []]] at []]: java.lang.ClassCastException : java.lang.Integer 不能转换为 java.lang.String 我的数据中有一些空值,我正在尝试识别它们并将它们分配为“0”,如下所示:import_on_order = FOREACH A_test1 GENERATE loc, item, ksn_id, (double)(qty is NULL ? '0' : qty) as qty:double, /*(IsNull(qty,'') !='' ? qty : 0) as qty:long,*/ scheddate, ' ' AS order_source, ' ' AS shc_item_type_cd ; 你能看看这个并帮忙吗?

以上是关于Apache Pig:在 Pig 中处理数据类型时面临问题的主要内容,如果未能解决你的问题,请参考以下文章

Pig AvroStorage + 记录中不支持的类型:类 org.apache.pig.data.DataByteArray

在 Apache Pig 中处理分隔符

Apache Pig 将整个关系加载到 UDF

使用 Apache Pig 从文本文件中获取备用行

Pig的Python UDF:数据类型转换错误

Pig的安装和简单实用