用python提取xml文件<text></text>标签内的文本内容

Posted 2023-04-06

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了用python提取xml文件<text></text>标签内的文本内容相关的知识，希望对你有一定的参考价值。

就是在一个大的xml文件中有很多***<text>********</text>****，我想用python提取text标签里的文本。

参考技术A import re
pattern = "<text>(.*?)</text>"

m = re.match(pattern,str)
m.group(1) 捕获的第一个匹配本回答被提问者采纳参考技术B 用xml模块操作参考技术C 这个不知道哦，帮不了你了

如何使用 Python ElementTree 提取 xml 属性

【中文标题】如何使用 Python ElementTree 提取 xml 属性【英文标题】：How to extract xml attribute using Python ElementTree 【发布时间】：2011-06-02 04:11:57 【问题描述】：

为：

<foo>
 <bar key="value">text</bar>
</foo>

如何获得“价值”？

xml.findtext("./bar[@key]")

引发错误。

【问题讨论】：

【参考方案1】：

dipenparmar12 函数不会返回孩子的子属性。因为该函数是递归的，所以每次调用的属性列表都将设置为一个空列表。此函数将返回孩子的孩子。

import xml.etree.ElementTree as etree
xml= etree.fromstring(xmlString) 


 def get_attr(xml, attributes):
     for child in (xml):
         if len(child.attrib)!= 0:
             attributes.append(child.attrib)
         get_attr(child,attributes)
     return attributes

  attributes = get_attr(xml,[])
  print(attributes)

【讨论】：

【参考方案2】：

通过以下方法，您可以从xml中获取所有属性（在字典中）

import xml.etree.ElementTree as etree
xmlString= "<feed xml:lang='en'><title>World Wide Web</title><subtitle lang='en'>Programming challenges</subtitle><link rel='alternate' type='text/html' href='http://google.com/'/><updated>2019-12-25T12:00:00</updated></feed>"
xml= etree.fromstring(xmlString)  

def get_attr(xml):
    attributes = []
    for child in (xml):
        if len(child.attrib)!= 0:
            attributes.append(child.attrib)
        get_attr(child)
    return attributes
attributes = get_attr(xml)

print(attributes)

【讨论】：

【参考方案3】：

使用 ElementTree 在 XML 中获取子标签的属性值

解析 XML 文件并获取 root 标签，然后使用 [0] 将给我们第一个子标签。同样[1], [2] 给了我们后续的子标签。获取子标签后，使用.attrib[attribute_name] 获取该属性的值。

>>> import xml.etree.ElementTree as ET
>>> xmlstr = '<foo><bar key="value">text</bar></foo>'
>>> root = ET.fromstring(xmlstr)
>>> root.tag
'foo'
>>> root[0].tag
'bar'
>>> root[0].attrib['key']
'value'

如果 xml 内容在文件中。您应该执行以下任务以获得root。

>>> tree = ET.parse('file.xml')
>>> root = tree.getroot()

【讨论】：

我收到错误消息 AttributeError: 'builtin_function_or_method' object has no attribute 'fromstring'【参考方案4】：

这将找到名为bar 的元素的第一个实例并返回属性key 的值。

In [52]: import xml.etree.ElementTree as ET

In [53]: xml=ET.fromstring(contents)

In [54]: xml.find('./bar').attrib['key']
Out[54]: 'value'

【讨论】：

@Stevoisiak，我有类似的问题

&lt;image height="940" id="0" name="C02032-390.jpg" width="1820"&gt;                           &lt;box label="Objects" occluded="1" xbr="255" xtl="0" ybr="624" ytl="509"&gt;         &lt;attribute name="Class"&gt;Car&lt;/attribute&gt;     &lt;/box&gt;                                                                                                                    &lt;/image&gt;

我想从属性中访问“汽车”。【参考方案5】：

你的表情：

./bar[@key]

这意味着：bar 具有key 属性的孩子

如果要选择属性，请使用以下相对表达式：

bar/@key

意思是：barchildren 的key 属性

当然，您需要考虑使用完全兼容的 XPath 引擎，例如 lxml。

【讨论】：

不确定是 ElementTree 还是 Google App Engine，但使用 '@' 会引发 SyntaxError("unsupported path syntax (%s)" % op) SyntaxError: unsupported path syntax (@) @Will Merydith：请阅读我的最后一句话。基本的ElementTree API 它不是一个完整的抱怨 XPath 引擎... 好的。我会看看能不能找到适用于 GAE/Py2.5.5 的模块。似乎python ElementTree 不支持bar/@key 之类的语法，您必须使用xxx.attribut.get("key") 来获取相应的xxx。

以上是关于用python提取xml文件<text></text>标签内的文本内容的主要内容，如果未能解决你的问题，请参考以下文章