使用 xPath 解析 xml 并提取属性值
Posted
技术标签:
【中文标题】使用 xPath 解析 xml 并提取属性值【英文标题】:xml parsing and extract attribute value with xPath 【发布时间】:2021-07-31 13:45:21 【问题描述】:我想用 xPath 提取 N.1.2, N.1.1, N.2.r.1, ...., N.1.3, N.1.4
所以,我的字典中有 xpath。
# Value - Types of Message in batch
"N.1.1": R3Item(
elemId="N.1.1",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/name[@codeSystem='2.16.840.1.113883.3.989.2.1.1.1']/@code",
required=True,
comment="N.1.1 - Types of Message in batch",
),
# Types of Message in batch
"N.1.1_csv": R3Item(
elemId="N.1.1_csv",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/name[@codeSystem='2.16.840.1.113883.3.989.2.1.1.1']/@codeSystemVersion",
required=True,
),
# Value - Batch Number
"N.1.2": R3Item(
elemId="N.1.2",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/id[@root='2.16.840.1.113883.3.989.2.1.3.22']/@extension",
required=True,
comment="N.1.2 - Batch Number",
),
# Value - Batch Sender Identifier
"N.1.3": R3Item(
elemId="N.1.3",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/sender[@typeCode='SND']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.13'][1]/@extension",
required=True,
comment="N.1.3 - Batch Sender Identifier",
),
# Value - Batch Receiver Identifier
"N.1.4": R3Item(
elemId="N.1.4",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/receiver[@typeCode='RCV']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.14'][1]/@extension",
required=True,
comment="N.1.4 - Batch Receiver Identifier",
),
# Value - Date of Batch Transmission
"N.1.5": R3Item(
elemId="N.1.5",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/creationTime/@value",
required=True,
comment="N.1.5 - Date of Batch Transmission",
),
# Value - Message Identifier
"N.2.r.1": R3Item(
elemId="N.2.r.1",
xPath="//PORR_IN049016UV[r]/id[@root='2.16.840.1.113883.3.989.2.1.3.1'][1]/@extension",
required=True,
comment="N.2.r.1 - Message Identifier",
),
# Value - Message Sender Identifier
"N.2.r.2": R3Item(
elemId="N.2.r.2",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/sender[@typeCode='SND']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.11'][1]/@extension",
required=True,
comment="N.2.r.2 - Message Sender Identifier",
),
# Value - Message Receiver Identifier
"N.2.r.3": R3Item(
elemId="N.2.r.3",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/receiver[@typeCode='RCV']/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.12'][1]/@extension",
required=True,
comment="N.2.r.3 - Message Receiver Identifier",
),
# Value - Date of Message Creation
"N.2.r.4": R3Item(
elemId="N.2.r.4",
xPath="/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[r]/creationTime/@value",
required=True,
comment="N.2.r.4 - Date of Message Creation",
),
下面是示例xml的一部分
<?xml version="1.0" encoding="UTF-8"?>
<MCCI_IN200100UV01 ITSVersion="XML_1.0" xsi:schemaLocation="urn:hl7-org:v3 MCCI_IN200100UV01.xsd" xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<id extension="N.1.2" root="2.16.840.1.113883.3.989.2.1.3.22"/>
<creationTime value="N.1.5"/>
<responseModeCode code="D"/>
<interactionId extension="MCCI_IN200100UV01" root="2.16.840.1.113883.1.6"/>
<name code="N.1.1" codeSystem="2.16.840.1.113883.3.989.2.1.1.1" codeSystemVersion="1.01"/>
<PORR_IN049016UV>
<id extension="N.2.r.1" root="2.16.840.1.113883.3.989.2.1.3.1"/>
<creationTime value="N.2.r.4"/>
<interactionId extension="PORR_IN049016UV" root="2.16.840.1.113883.1.6"/>
<processingCode code="P"/>
<processingModeCode code="T"/>
<acceptAckCode code="AL"/>
<receiver typeCode="RCV">
<device classCode="DEV" determinerCode="INSTANCE">
<id extension="N.2.r.3" root="2.16.840.1.113883.3.989.2.1.3.12"/>
</device>
</receiver>
</PORR_IN049016UV>
<receiver typeCode="RCV">
<device classCode="DEV" determinerCode="INSTANCE">
<id extension="N.1.4" root="2.16.840.1.113883.3.989.2.1.3.14"/>
</device>
</receiver>
<sender typeCode="SND">
<device classCode="DEV" determinerCode="INSTANCE">
<id extension="N.1.3" root="2.16.840.1.113883.3.989.2.1.3.13"/>
</device>
</sender>
</MCCI_IN200100UV01>
下面是我的代码,但结果是空列表。 我想像“N.1.1”一样提取
def extractData(tree):
"""r3 data extracted by xpath"""
root = tree.getroot()
keys = getList(R3_DATA)
for key in keys:
xPath = getxPath(key)
print(root.xpath(xPath))
我应该如何解决这个问题或者我应该怎么做? 如果有其他库或示例代码可以做到这一点,你能告诉我吗?
【问题讨论】:
getvalue 返回键的字典路径 元素位于命名空间xmlns="urn:hl7-org:v3"
中,因此您的 XPath 评估代码需要考虑命名空间。
在一些较旧的 lxml 版本中,但不是最新版本,我认为您可以使用 root.xpath(xPath, namespaces = None : 'urn:hl7-org:v3' )
出现错误(TypeError: empty namespace prefix is not supported in XPath )。 root.xpath("/MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/PORR_IN049016UV[1]/sender[@typeCode='SND' ]/device[@classCode='DEV'][@determinerCode='INSTANCE']/id[@root='2.16.840.1.113883.3.989.2.1.3.11'][1]/@extension",namespaces=无:'urn:hl7-org:v3', "xsi":"w3.org/2001/XMLSchema-instance")
每个路径中的所有元素都需要使用命名空间前缀
【参考方案1】:
如前所述,您的 xpath 需要命名空间。这是一个如何在 lxml 中使用命名空间的示例。注意 xpath 中的 u:
和 x:
前缀。
In [1]: from lxml import etree
In [2]: root = etree.parse('mcci.xml')
In [3]: NS = 'u': 'urn:hl7-org:v3', 'x': 'http://www.w3.org/2001/XMLSchema-instance'
In [4:] xpath = "/u:MCCI_IN200100UV01[@ITSVersion='XML_1.0'][@x:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']/u:creationTime/@value"
In [5]: root.xpath(xpath, namespaces=NS)
Out[5]: ['N.1.5']
我可能会建议删除涉及架构位置的谓词以简化一些事情。
In [6]: NS = 'u': 'urn:hl7-org:v3'
In [7]: xpath = "/u:MCCI_IN200100UV01[@ITSVersion='XML_1.0']/u:creationTime/@value"
In [8]: root.xpath(xpath, namespaces=NS)
Out[8]: ['N.1.5']
【讨论】:
以上是关于使用 xPath 解析 xml 并提取属性值的主要内容,如果未能解决你的问题,请参考以下文章
XML 解析 - 使用 KissXML 和 XPath 将属性分组到 nsdictionary