从 xml 中提取值,它具有命名空间并解析 xml cdata
Posted
技术标签:
【中文标题】从 xml 中提取值,它具有命名空间并解析 xml cdata【英文标题】:Extract values from xml and it has namespaces and parsing xml cdata 【发布时间】:2021-05-18 08:53:48 【问题描述】:我正在尝试使用以下 oracle SQL 查询从 xml 中提取值,但它正在重新调整空数据。我不确定我的查询出了什么问题,但它适用于常规 xml(没有名称空间和 CDATA)。如果 xml 中有 CDATA 和命名空间,任何人都可以知道如何提取值。请帮忙。提前致谢。
SELECT EXTRACT (VALUE (a1), '/AttachedDocument/ParentDocumentID/text()').getStringVal () AS ParentDocumentID
,EXTRACT (VALUE (a1), '/AttachedDocument/SenderParty/PartyTaxScheme/RegistrationName/text()').getStringVal () AS RegistrationName
,EXTRACT (VALUE (a1), '/AttachedDocument/Attachment/ExternalReference/MimeCode/text()').getStringVal () AS MimeCode
,EXTRACT (VALUE (a1), '/AttachedDocument/Attachment/ExternalReference/Description/DocumentCurrencyCode/text()').getStringVal () AS DocumentCurrencyCode
,EXTRACT (VALUE (a1), '/AttachedDocument/Attachment/ExternalReference/Description/AccountingSupplierParty/Party/PartyName/Name/text()').getStringVal () AS PartyName
FROM
TABLE (
XMLSEQUENCE (
EXTRACT ( xmltype(
'<?xml version="1.0" encoding="UTF-8"?>
<AttachedDocument xmlns="urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ccts="urn:un:unece:uncefact:data:specification:CoreComponentTypeSchemaModule:2" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:xades="http://uri.etsi.org/01903/v1.3.2#" xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#">
<cbc:DocumentType>Test Doc</cbc:DocumentType>
<cbc:ParentDocumentID>1245</cbc:ParentDocumentID>
<cac:SenderParty>
<cac:PartyTaxScheme>
<cbc:RegistrationName>SSS</cbc:RegistrationName>
<cbc:CompanyID schemeName="5" schemeID="8" schemeAgencyID="195">11000912</cbc:CompanyID>
<cac:TaxScheme>
<cbc:Name>IVA</cbc:Name>
</cac:TaxScheme>
</cac:PartyTaxScheme>
</cac:SenderParty>
<cac:Attachment>
<cac:ExternalReference>
<cbc:MimeCode>text/xml</cbc:MimeCode>
<cbc:EncodingCode>UTF-8</cbc:EncodingCode>
<cbc:Description><![CDATA[<?xml version="1.0" encoding="utf-8"?><Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:sts="dian:gov:co:facturaelectronica:Structures-2-1" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xades="http://uri.etsi.org/01903/v1.3.2#" xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<cbc:DocumentCurrencyCode>COP</cbc:DocumentCurrencyCode>
<cac:AccountingSupplierParty>
<cbc:AdditionalAccountID schemeAgencyID="195">1</cbc:AdditionalAccountID>
<cac:Party>
<cac:PartyName>
<cbc:Name>First & Sample SSS</cbc:Name>
</cac:PartyName>
</cac:AccountingSupplierParty>]]></cbc:Description>
</cac:ExternalReference>
</cac:Attachment>
</AttachedDocument>'),
'/AttachedDocument' ,
'xmlns="urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2"
xmlns:ds="http://www.w3.org/2000/09/xmldsig#"
xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
xmlns:ccts="urn:un:unece:uncefact:data:specification:CoreComponentTypeSchemaModule:2"
xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2"
xmlns:xades="http://uri.etsi.org/01903/v1.3.2#"
xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#"'
))) a1
【问题讨论】:
我假设你必须声明命名空间,参见:***.com/questions/38439595/… 【参考方案1】:如果您采用这种方法,则必须在 all extract()
子句中声明命名空间,例如:
SELECT EXTRACT (VALUE (a1), '/AttachedDocument/cbc:ParentDocumentID/text()',
'xmlns="urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2"
xmlns:ds="http://www.w3.org/2000/09/xmldsig#"
xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
xmlns:ccts="urn:un:unece:uncefact:data:specification:CoreComponentTypeSchemaModule:2"
xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2"
xmlns:xades="http://uri.etsi.org/01903/v1.3.2#"
xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#"'
).getStringVal () AS ParentDocumentID
...
这显然会变得混乱和痛苦;虽然您只需要声明您在 XPath 中引用的那些。
但是extract()
一直以来都是deprecated,所以除非您使用的是非常旧的版本,否则使用 XMLTable() 会简单得多:
SELECT x1.ParentDocumentID, x1.RegistrationName, x1.MimeCode,
x2.DocumentCurrencyCode, x2.PartyName
FROM XMLTable (
XMLNamespaces (
default 'urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2',
'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
),
'/AttachedDocument'
passing xmltype('<!-- your XML here -->')
columns ParentDocumentID number path 'cbc:ParentDocumentID',
RegistrationName varchar2(16) path 'cac:SenderParty/cac:PartyTaxScheme/cbc:RegistrationName',
MimeCode varchar2(10) path 'cac:Attachment/cac:ExternalReference/cbc:MimeCode',
Description clob path 'cac:Attachment/cac:ExternalReference/cbc:Description/text()'
) x1
OUTER APPLY XMLTable (
XMLNamespaces (
default 'urn:oasis:names:specification:ubl:schema:xsd:Invoice-2',
'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
),
'/Invoice'
passing XMLType(x1.Description)
columns DocumentCurrencyCode varchar2(3) path 'cbc:DocumentCurrencyCode',
PartyName varchar2(50) path 'cac:AccountingSupplierParty/cac:Party/cac:PartyName/cbc:Name'
) x2;
必须将 CDATA 提取为文本节点,然后将其评估为单独的 XMLTable;另请注意,您的 CDATA 块中的默认命名空间不同。我省略了未使用的命名空间。
您的 CDATA 格式也有误 - 它缺少 Party 和 Invoice 的结束标签。将这些添加到您的 XML 文档中:
SELECT x1.ParentDocumentID, x1.RegistrationName, x1.MimeCode,
x2.DocumentCurrencyCode, x2.PartyName
FROM XMLTable (
XMLNamespaces (
default 'urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2',
'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
),
'/AttachedDocument'
passing xmltype('<?xml version="1.0" encoding="UTF-8"?>
<AttachedDocument xmlns="urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ccts="urn:un:unece:uncefact:data:specification:CoreComponentTypeSchemaModule:2" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:xades="http://uri.etsi.org/01903/v1.3.2#" xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#">
<cbc:DocumentType>Test Doc</cbc:DocumentType>
<cbc:ParentDocumentID>1245</cbc:ParentDocumentID>
<cac:SenderParty>
<cac:PartyTaxScheme>
<cbc:RegistrationName>SSS</cbc:RegistrationName>
<cbc:CompanyID schemeName="5" schemeID="8" schemeAgencyID="195">11000912</cbc:CompanyID>
<cac:TaxScheme>
<cbc:Name>IVA</cbc:Name>
</cac:TaxScheme>
</cac:PartyTaxScheme>
</cac:SenderParty>
<cac:Attachment>
<cac:ExternalReference>
<cbc:MimeCode>text/xml</cbc:MimeCode>
<cbc:EncodingCode>UTF-8</cbc:EncodingCode>
<cbc:Description><![CDATA[<?xml version="1.0" encoding="utf-8"?><Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:sts="dian:gov:co:facturaelectronica:Structures-2-1" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xades="http://uri.etsi.org/01903/v1.3.2#" xmlns:xades141="http://uri.etsi.org/01903/v1.4.1#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<cbc:DocumentCurrencyCode>COP</cbc:DocumentCurrencyCode>
<cac:AccountingSupplierParty>
<cbc:AdditionalAccountID schemeAgencyID="195">1</cbc:AdditionalAccountID>
<cac:Party>
<cac:PartyName>
<cbc:Name>First & Sample SSS</cbc:Name>
</cac:PartyName>
</cac:Party>
</cac:AccountingSupplierParty>
</Invoice>]]></cbc:Description>
</cac:ExternalReference>
</cac:Attachment>
</AttachedDocument>')
columns ParentDocumentID number path 'cbc:ParentDocumentID',
RegistrationName varchar2(16) path 'cac:SenderParty/cac:PartyTaxScheme/cbc:RegistrationName',
MimeCode varchar2(10) path 'cac:Attachment/cac:ExternalReference/cbc:MimeCode',
Description clob path 'cac:Attachment/cac:ExternalReference/cbc:Description/text()'
) x1
OUTER APPLY XMLTable (
XMLNamespaces (
default 'urn:oasis:names:specification:ubl:schema:xsd:Invoice-2',
'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
),
'/Invoice'
passing XMLType(x1.Description)
columns DocumentCurrencyCode varchar2(3) path 'cbc:DocumentCurrencyCode',
PartyName varchar2(50) path 'cac:AccountingSupplierParty/cac:Party/cac:PartyName/cbc:Name'
) x2;
生成:
PARENTDOCUMENTID REGISTRATIONNAME MIMECODE DOCUMENTCURRENCYCODE PARTYNAME
---------------- ---------------- ---------- -------------------- ------------------
1245 SSS text/xml COP First & Sample SSS
db<>fiddle
如果 XML 字符串来自表中的列,那么您可以交叉连接/应用到第一个 XMLTable 子句:
SELECT x1.ParentDocumentID, x1.RegistrationName, x1.MimeCode,
x2.DocumentCurrencyCode, x2.PartyName
FROM your_table t
CROSS APPLY XMLTable (
XMLNamespaces (
default 'urn:oasis:names:specification:ubl:schema:xsd:AttachedDocument-2',
'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
),
'/AttachedDocument'
passing xmltype(t.xml_string)
columns ParentDocumentID number path 'cbc:ParentDocumentID',
RegistrationName varchar2(16) path 'cac:SenderParty/cac:PartyTaxScheme/cbc:RegistrationName',
MimeCode varchar2(10) path 'cac:Attachment/cac:ExternalReference/cbc:MimeCode',
Description clob path 'cac:Attachment/cac:ExternalReference/cbc:Description/text()'
) x1
OUTER APPLY XMLTable (
XMLNamespaces (
default 'urn:oasis:names:specification:ubl:schema:xsd:Invoice-2',
'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2' as "cac",
'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2' as "cbc"
),
'/Invoice'
passing XMLType(x1.Description)
columns DocumentCurrencyCode varchar2(3) path 'cbc:DocumentCurrencyCode',
PartyName varchar2(50) path 'cac:AccountingSupplierParty/cac:Party/cac:PartyName/cbc:Name'
) x2;
db<>fiddle
如果您使用的版本不支持apply
,那么您可以改为cross join
;第二次加入更成问题,但如果您知道您将始终拥有 CDATA 发票,那么这也可以是交叉加入; here in 11gR2.
【讨论】:
以上是关于从 xml 中提取值,它具有命名空间并解析 xml cdata的主要内容,如果未能解决你的问题,请参考以下文章
使用 Python Etree 解析 XML 并返回指定的标签而不考虑命名空间