使用 Python 将 XML 转换为 CSV

Posted

技术标签:

【中文标题】使用 Python 将 XML 转换为 CSV【英文标题】:XML to CSV using Python 【发布时间】:2017-09-16 07:48:54 【问题描述】:

我有想要使用 Python 将其转换为 CSV 的 XML 文件。我需要Testitemname 标记中的内容作为CSV 标头和Testvalue 标记中的内容作为CSV 中的值。有人可以帮我解决这个问题吗?

示例 XML 文件(输入)

<sample:batch xmlns:sample="http://sample.com/schema/sampleimport">
    <sample:TestData>
        <sample:Testitem>
            <sample:TestitemName>Field1</sample:TestitemName>
            <sample:Testvalue>1</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field2</sample:TestitemName>
            <sample:Testvalue>Hi</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field3</sample:TestitemName>
            <sample:Testvalue>1234</sample:Testvalue>
        </sample:TestData>
        <sample:TestData>
        <sample:Testitem>
            <sample:TestitemName>Field1</sample:TestitemName>
            <sample:Testvalue>3</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field2</sample:TestitemName>
            <sample:Testvalue>Hello</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field3</sample:TestitemName>
            <sample:Testvalue>999</sample:Testvalue>
        </sample:TestData>

所需的 CSV 文件(输出)

Field1,Field2,Filed3 (Header field names)
1,Hi,1234 (1st record)
3,Hello,999 (2nd record)

【问题讨论】:

【参考方案1】:

BeautifulSoup 可用于解析 XML 数据。有了组织良好的数据,您只需要遍历嵌套的标签类型并随时收集数据。

代码:

from BeautifulSoup import BeautifulSoup as Soup

def parse_xml(file_like):
    data = []
    names = []
    soup = Soup(file_like)
    for batch in soup.findAll('sample:batch'):
        for test_data in batch.findAll('sample:testdata'):
            item = 
            for test_item in test_data.findAll('sample:testitem'):
                name = test_item.find('sample:testitemname').text
                value = test_item.find('sample:testvalue').text
                item[name] = value
                if name not in names:
                    names.append(name)
            data.append(item)

    return [names] + [[datum.get(name) for name in names] for datum in data]

测试代码:

data = parse_xml(xml_data)
for datum in data:
    print(','.join(datum))

测试数据:

from io import StringIO
xml_data = StringIO(u"""
    <sample:batch xmlns:sample="http://sample.com/schema/sampleimport">
        <sample:TestData>
            <sample:Testitem>
                <sample:TestitemName>Field1</sample:TestitemName>
                <sample:Testvalue>1</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field2</sample:TestitemName>
                <sample:Testvalue>Hi</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field3</sample:TestitemName>
                <sample:Testvalue>1234</sample:Testvalue>
        </sample:TestData>
        <sample:TestData>
            <sample:Testitem>
                <sample:TestitemName>Field1</sample:TestitemName>
                <sample:Testvalue>3</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field2</sample:TestitemName>
                <sample:Testvalue>Hello</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field3</sample:TestitemName>
                <sample:Testvalue>999</sample:Testvalue>
            </sample:TestItem>
        </sample:TestData>
    </sample:batch>
""")

结果:

Field1,Field2,Field3
1,Hi,1234
3,Hello,999

【讨论】:

感谢斯蒂芬,它成功了!我想将输出写入 CSV 文件。你能再帮我一次吗? 我显示的输出是 CSV... 只需写入文件而不是打印到屏幕【参考方案2】:

使用 pyxmlparser

这是一个命令行实用程序来做同样的事情!

https://pypi.org/project/pyxmlparser/

免责声明:我是图书馆的作者。由于它是新的,我很高兴知道它是否有效。

【讨论】:

以上是关于使用 Python 将 XML 转换为 CSV的主要内容,如果未能解决你的问题,请参考以下文章

使用 Python 或 XSLT 将复杂的 XML 转换为 CSV

在 python 中使用 Argparse 将 xml 转换为 csv

使用 Python 将 CSV 行转换为 XML 文件

将xml转换为python dict

python [xml文件到voc的csv文件]将voc标签转换为xml格式为csv格式#python #csv #xml

Python爬虫编程思想(77): XML字符串转换为字典