python读取xml问题

Posted 2023-05-12

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python读取xml问题相关的知识，希望对你有一定的参考价值。

格式如下：

<?xml version="1.0" encoding="UTF-8"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
dd

<Worksheet ss:Name="Table1">
<Table>
<Row>
<Cell><Data ss:Type="String">name</Data></Cell>
<Cell><Data ss:Type="String">age</Data></Cell>
<Cell><Data ss:Type="String">sex</Data></Cell>
<Cell><Data ss:Type="String">address</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">bnw</Data></Cell>
<Cell><Data ss:Type="String">12</Data></Cell>
<Cell><Data ss:Type="String">ssssssssss</Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>
如何读取数据

用xml.dom模块就可以简单的实现了

from xml.dom import minidom
xmldoc = minidom.parse('t.xml')
tableList = xmldoc.getElementsByTagName('Table')
rowList = xmldoc.getElementsByTagName('Row')
rowAll = []
for r in rowList:
    rowData = []
    for c in r.getElementsByTagName('Cell'):
        rowData.append(c.getElementsByTagName('Data')[0].firstChild.nodeValue)
    rowAll.append(rowData)
print rowAll
#[[u'name', u'age', u'sex', u'address'], [u'bnw', u'12', u'ssssssssss']]

追问

问题在我，原文件里的格式如下：
bnw

并不是代码有问题，而是分析的xml有变化，因此出现AttributeError:'NoneType' object has no attribute 'nodeValue'，还请网友们不要质疑该代码的正确性。

参考技术A 有几个选择，lxm，beautifulsoup，当然还有官方库，不过着两个用着要简单的多

使用 Python 直接从 zip 文件中读取 xml 文件

【中文标题】使用 Python 直接从 zip 文件中读取 xml 文件【英文标题】：Read xml files directly from a zip file using Python 【发布时间】：2016-02-14 18:48:28 【问题描述】：

我有以下 zip 文件结构：

some_file.zip/folder/folder/files.xml

所以我在 zip 文件的子文件夹中有很多 xml 文件。

到目前为止，我已经设法使用以下代码解压缩 zip 文件：

import os.path
import zipfile

with zipfile.ZipFile('some_file.zip') as zf:
    for member in zf.infolist():
        # Path traversal defense copied from
        # http://hg.python.org/cpython/file/tip/Lib/http/server.py#l789
        words = member.filename.split('/')
        path = "output"
        for word in words[:-1]:
            drive, word = os.path.splitdrive(word)
            head, word = os.path.split(word)
            if word in (os.curdir, os.pardir, ''): continue
            path = os.path.join(path, word)

        zf.extract(member, path)

但我不需要提取文件，而是直接从 zip 文件中读取它们。因此，要么在 for 循环中读取每个文件并对其进行处理，要么将每个文件保存在 Python 中的某种数据结构中。有可能吗？

【问题讨论】：

【参考方案1】：

正如 Robin Davis 所写的那样，zf.open() 可以解决问题。这是一个小例子：

import zipfile

zf = zipfile.ZipFile('some_file.zip', 'r')

for name in zf.namelist():
    if name.endswith('/'): continue

    if 'folder2/' in name:
        f = zf.open(name)
        # here you do your magic with [f] : parsing, etc.
        # this will print out file contents
        print(f.read())

正如 OP 在评论中希望的那样，只会处理“folder2”中的文件...

【讨论】：

所以这将提取所有不是文件夹的文件。但是如何从此处的特定文件夹中提取文件？假设我有 some_file.zip/folder1/files 和 some_file.zip/folder2/files，例如，如何仅从文件夹 2 中提取文件？【参考方案2】：

zf.open() 将返回一个类似对象的文件而不提取它。

【讨论】：

以上是关于python读取xml问题的主要内容，如果未能解决你的问题，请参考以下文章