从 xml 中提取值,它具有命名空间并解析 xml cdata

Posted

技术标签:

【中文标题】从 xml 中提取值,它具有命名空间并解析 xml cdata【英文标题】:Extract values from xml and it has namespaces and parsing xml cdata 【发布时间】:2021-05-18 08:53:48 【问题描述】:

我正在尝试使用以下 oracle SQL 查询从 xml 中提取值,但它正在重新调整空数据。我不确定我的查询出了什么问题,但它适用于常规 xml(没有名称空间和 CDATA)。如果 xml 中有 CDATA 和命名空间,任何人都可以知道如何提取值。请帮忙。提前致谢。

SELECT EXTRACT (VALUE (a1), '/AttachedDocument/ParentDocumentID/text()').getStringVal () AS ParentDocumentID
      ,EXTRACT (VALUE (a1), '/AttachedDocument/SenderParty/PartyTaxScheme/RegistrationName/text()').getStringVal () AS RegistrationName
                  ,EXTRACT (VALUE (a1), '/AttachedDocument/Attachment/ExternalReference/MimeCode/text()').getStringVal () AS MimeCode
                  ,EXTRACT (VALUE (a1), '/AttachedDocument/Attachment/ExternalReference/Description/DocumentCurrencyCode/text()').getStringVal () AS DocumentCurrencyCode
                  ,EXTRACT (VALUE (a1), '/AttachedDocument/Attachment/ExternalReference/Description/AccountingSupplierParty/Party/PartyName/Name/text()').getStringVal () AS PartyName
FROM 
       TABLE (
          XMLSEQUENCE (
             EXTRACT ( xmltype(
                '<?xml version="1.0" encoding="UTF-8"?>
<AttachedDocument xmlns="urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ccts="urn:un:unece:uncefact:data:specification:CoreComponentTypeSchemaModule:2" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:xades="http://uri.etsi.org/01903/v1.3.2#" xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#">
   <cbc:DocumentType>Test Doc</cbc:DocumentType>
   <cbc:ParentDocumentID>1245</cbc:ParentDocumentID>
   <cac:SenderParty>
      <cac:PartyTaxScheme>
         <cbc:RegistrationName>SSS</cbc:RegistrationName>
         <cbc:CompanyID schemeName="5" schemeID="8" schemeAgencyID="195">11000912</cbc:CompanyID>
         <cac:TaxScheme>
            <cbc:Name>IVA</cbc:Name>
         </cac:TaxScheme>
      </cac:PartyTaxScheme>
   </cac:SenderParty>
   <cac:Attachment>
      <cac:ExternalReference>
         <cbc:MimeCode>text/xml</cbc:MimeCode>
         <cbc:EncodingCode>UTF-8</cbc:EncodingCode>
         <cbc:Description><![CDATA[<?xml version="1.0" encoding="utf-8"?><Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:sts="dian:gov:co:facturaelectronica:Structures-2-1" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xades="http://uri.etsi.org/01903/v1.3.2#" xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <cbc:DocumentCurrencyCode>COP</cbc:DocumentCurrencyCode>
  <cac:AccountingSupplierParty>
    <cbc:AdditionalAccountID schemeAgencyID="195">1</cbc:AdditionalAccountID>
    <cac:Party>
      <cac:PartyName>
        <cbc:Name>First &amp; Sample SSS</cbc:Name>
      </cac:PartyName>
        </cac:AccountingSupplierParty>]]></cbc:Description>
      </cac:ExternalReference>
   </cac:Attachment>
</AttachedDocument>'),
                '/AttachedDocument' ,
                'xmlns="urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2" 
                                                                 xmlns:ds="http://www.w3.org/2000/09/xmldsig#" 
                                                                 xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" 
                                                                 xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" 
                                                                 xmlns:ccts="urn:un:unece:uncefact:data:specification:CoreComponentTypeSchemaModule:2" 
                                                                 xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" 
                                                                 xmlns:xades="http://uri.etsi.org/01903/v1.3.2#" 
                                                                 xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#"'
                ))) a1  

【问题讨论】:

我假设你必须声明命名空间,参见:***.com/questions/38439595/… 【参考方案1】:

如果您采用这种方法,则必须在 all extract() 子句中声明命名空间,例如:

SELECT EXTRACT (VALUE (a1), '/AttachedDocument/cbc:ParentDocumentID/text()',
                'xmlns="urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2" 
                                                                 xmlns:ds="http://www.w3.org/2000/09/xmldsig#" 
                                                                 xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" 
                                                                 xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" 
                                                                 xmlns:ccts="urn:un:unece:uncefact:data:specification:CoreComponentTypeSchemaModule:2" 
                                                                 xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" 
                                                                 xmlns:xades="http://uri.etsi.org/01903/v1.3.2#" 
                                                                 xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#"'
       ).getStringVal () AS ParentDocumentID
...

这显然会变得混乱和痛苦;虽然您只需要声明您在 XPath 中引用的那些。

但是extract() 一直以来都是deprecated,所以除非您使用的是非常旧的版本,否则使用 XMLTable() 会简单得多:

SELECT x1.ParentDocumentID, x1.RegistrationName, x1.MimeCode,
  x2.DocumentCurrencyCode, x2.PartyName
FROM XMLTable (
  XMLNamespaces (
    default 'urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2',
    'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
    'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
  ),
  '/AttachedDocument'
  passing xmltype('<!-- your XML here -->')
  columns ParentDocumentID number path 'cbc:ParentDocumentID',
    RegistrationName varchar2(16) path 'cac:SenderParty/cac:PartyTaxScheme/cbc:RegistrationName',
    MimeCode varchar2(10) path 'cac:Attachment/cac:ExternalReference/cbc:MimeCode',
    Description clob path 'cac:Attachment/cac:ExternalReference/cbc:Description/text()'
) x1
OUTER APPLY XMLTable (
  XMLNamespaces (
    default 'urn:oasis:names:specification:ubl:schema:xsd:Invoice-2',
    'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
    'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
  ),
  '/Invoice'
  passing XMLType(x1.Description)
  columns DocumentCurrencyCode varchar2(3) path 'cbc:DocumentCurrencyCode',
    PartyName varchar2(50) path 'cac:AccountingSupplierParty/cac:Party/cac:PartyName/cbc:Name'
) x2;

必须将 CDATA 提取为文本节点,然后将其评估为单独的 XMLTable;另请注意,您的 CDATA 块中的默认命名空间不同。我省略了未使用的命名空间。

您的 CDATA 格式也有误 - 它缺少 Party 和 Invoice 的结束标签。将这些添加到您的 XML 文档中:

SELECT x1.ParentDocumentID, x1.RegistrationName, x1.MimeCode,
  x2.DocumentCurrencyCode, x2.PartyName
FROM XMLTable (
  XMLNamespaces (
    default 'urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2',
    'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
    'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
  ),
  '/AttachedDocument'
  passing xmltype('<?xml version="1.0" encoding="UTF-8"?>
<AttachedDocument xmlns="urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ccts="urn:un:unece:uncefact:data:specification:CoreComponentTypeSchemaModule:2" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:xades="http://uri.etsi.org/01903/v1.3.2#" xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#">
   <cbc:DocumentType>Test Doc</cbc:DocumentType>
   <cbc:ParentDocumentID>1245</cbc:ParentDocumentID>
   <cac:SenderParty>
      <cac:PartyTaxScheme>
         <cbc:RegistrationName>SSS</cbc:RegistrationName>
         <cbc:CompanyID schemeName="5" schemeID="8" schemeAgencyID="195">11000912</cbc:CompanyID>
         <cac:TaxScheme>
            <cbc:Name>IVA</cbc:Name>
         </cac:TaxScheme>
      </cac:PartyTaxScheme>
   </cac:SenderParty>
   <cac:Attachment>
      <cac:ExternalReference>
         <cbc:MimeCode>text/xml</cbc:MimeCode>
         <cbc:EncodingCode>UTF-8</cbc:EncodingCode>
         <cbc:Description><![CDATA[<?xml version="1.0" encoding="utf-8"?><Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:sts="dian:gov:co:facturaelectronica:Structures-2-1" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xades="http://uri.etsi.org/01903/v1.3.2#" xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <cbc:DocumentCurrencyCode>COP</cbc:DocumentCurrencyCode>
  <cac:AccountingSupplierParty>
    <cbc:AdditionalAccountID schemeAgencyID="195">1</cbc:AdditionalAccountID>
    <cac:Party>
      <cac:PartyName>
        <cbc:Name>First &amp; Sample SSS</cbc:Name>
      </cac:PartyName>
    </cac:Party>
  </cac:AccountingSupplierParty>
</Invoice>]]></cbc:Description>
      </cac:ExternalReference>
   </cac:Attachment>
</AttachedDocument>')
  columns ParentDocumentID number path 'cbc:ParentDocumentID',
    RegistrationName varchar2(16) path 'cac:SenderParty/cac:PartyTaxScheme/cbc:RegistrationName',
    MimeCode varchar2(10) path 'cac:Attachment/cac:ExternalReference/cbc:MimeCode',
    Description clob path 'cac:Attachment/cac:ExternalReference/cbc:Description/text()'
) x1
OUTER APPLY XMLTable (
  XMLNamespaces (
    default 'urn:oasis:names:specification:ubl:schema:xsd:Invoice-2',
    'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
    'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
  ),
  '/Invoice'
  passing XMLType(x1.Description)
  columns DocumentCurrencyCode varchar2(3) path 'cbc:DocumentCurrencyCode',
    PartyName varchar2(50) path 'cac:AccountingSupplierParty/cac:Party/cac:PartyName/cbc:Name'
) x2;

生成:

PARENTDOCUMENTID REGISTRATIONNAME MIMECODE   DOCUMENTCURRENCYCODE PARTYNAME
---------------- ---------------- ---------- -------------------- ------------------
            1245 SSS              text/xml   COP                  First & Sample SSS

db<>fiddle


如果 XML 字符串来自表中的列,那么您可以交叉连接/应用到第一个 XMLTable 子句:

SELECT x1.ParentDocumentID, x1.RegistrationName, x1.MimeCode,
  x2.DocumentCurrencyCode, x2.PartyName
FROM your_table t
CROSS APPLY XMLTable (
  XMLNamespaces (
    default 'urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2',
    'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
    'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
  ),
  '/AttachedDocument'
  passing xmltype(t.xml_string)
  columns ParentDocumentID number path 'cbc:ParentDocumentID',
    RegistrationName varchar2(16) path 'cac:SenderParty/cac:PartyTaxScheme/cbc:RegistrationName',
    MimeCode varchar2(10) path 'cac:Attachment/cac:ExternalReference/cbc:MimeCode',
    Description clob path 'cac:Attachment/cac:ExternalReference/cbc:Description/text()'
) x1
OUTER APPLY XMLTable (
  XMLNamespaces (
    default 'urn:oasis:names:specification:ubl:schema:xsd:Invoice-2',
    'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
    'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
  ),
  '/Invoice'
  passing XMLType(x1.Description)
  columns DocumentCurrencyCode varchar2(3) path 'cbc:DocumentCurrencyCode',
    PartyName varchar2(50) path 'cac:AccountingSupplierParty/cac:Party/cac:PartyName/cbc:Name'
) x2;

db<>fiddle

如果您使用的版本不支持apply,那么您可以改为cross join;第二次加入更成问题,但如果您知道您将始终拥有 CDATA 发票,那么这也可以是交叉加入; here in 11gR2.

【讨论】:

以上是关于从 xml 中提取值,它具有命名空间并解析 xml cdata的主要内容,如果未能解决你的问题,请参考以下文章

具有动态命名空间的 Oracle 提取值

解析 XML 命名空间?

使用 Python Etree 解析 XML 并返回指定的标签而不考虑命名空间

如何从杰克逊 XML 解析中删除命名空间定义

使用 DOM 解析器在 Java 中解析具有 2 个默认命名空间的 XML

C# 在忽略命名空间的同时反序列化 xml