在python中将excel转换为XML

Posted

技术标签:

【中文标题】在python中将excel转换为XML【英文标题】:Convert excel to XML in python 【发布时间】:2021-12-25 23:57:31 【问题描述】:

我正在尝试将 excel 数据库转换为 python。 我有一个交易数据需要以 xml 格式导入系统。

我的代码如下:

df = pd.read_excel("C:/Users/junag/Documents/XML/Portfolio2.xlsx", sheet_name="Sheet1", dtype=object)
root = ET.Element('trading-data')
root.set('xmlns:xsi', 'http://www.w3.org/2001/XMLSchema-instance')
tree = ET.ElementTree(root)
Portfolios = ET.SubElement(root, "Portfolios")
Defaults = ET.SubElement(Portfolios, "Defaults", BaseCurrency="USD")

for row in df.itertuples():
Portfolio = ET.SubElement(Portfolios, "Portfolio", Name=row.Name, BaseCurrency=row.BaseCurrency2, TradingPower=str(row.TradingPower),
                          ValidationProfile=row.ValidationProfile, CommissionProfile=row.CommissionProfile)
PortfolioPositions = ET.SubElement(Portfolio, "PortfolioPositions")
if row.Type == "Cash":
    PortfolioPosition = ET.SubElement(PortfolioPositions, "PortfolioPosition", Type=row.Type, Volume=str(row.Volume))
    Cash = ET.SubElement(PortfolioPosition, 'Cash', Currency=str(row.Currency))
else:
    PortfolioPosition = ET.SubElement(PortfolioPositions, "PortfolioPosition", Type=row.Type, Volume=str(row.Volume),
                                      Invested=str(row.Invested), BaseInvested=str(row.BaseInvested))
    Instrument = ET.SubElement(PortfolioPosition, 'Instrument', Ticker=str(row.Ticker), ISIN=str(row.ISIN), Market=str(row.Market),
                               Currency=str(row.Currency2), CFI=str(row.CFI))


ET.indent(tree, space="\t", level=0)
tree.write("Portfolios_converted2.xml", encoding="utf-8")

输出如下所示: enter image description here

虽然我需要它看起来像这样: enter image description here

如何改进我的代码以使输出的 xml 看起来更好?请指教

这里是excel数据:

【问题讨论】:

请定义“看起来更好”。当前输出有什么问题? 子标签“Portfolio”属于“Portfolios”,“PortfolioPositions”属于“Portfolio”。问题在于,对于每个投资组合头寸,“Portfolio”和“PortfolioPositions”都有一个单独的开始和结束标签,而“Portfolio”应该有一个统计和结束标签,“PortfolioPositions”应该有一个开始和结束标签,以及里面应该是职位。 对于minimal reproducible example,请发布示例数据。我们无法访问您的 Excel 文件。 【参考方案1】:

由于您需要单个 <Portfolio><PortfolioPositions> 作为父分组,因此请考虑通过遍历数据框列表来进行嵌套循环。然后,在每个数据框内循环遍历其行:

import xml.etree.ElementTree as ET
import pandas as pd
import xml.dom.minidom as md

df = pd.read_excel("Input.xlsx", sheet_name="Sheet1", dtype=object)

# LIST OF DATA FRAME SPLITS
df_list = [g for i,g in df.groupby(
    ["Name", "BaseCurrency2", "TradingPower", "ValidationProfile", "CommissionProfile"]
)]

# ROOT LEVEL
root = ET.Element('trading-data')
root.set('xmlns:xsi', 'http://www.w3.org/2001/XMLSchema-instance')

# ROOT CHILD LEVEL
Portfolios = ET.SubElement(root, "Portfolios")
Defaults = ET.SubElement(Portfolios, "Defaults", BaseCurrency="USD")

# GROUP LEVEL ITERATION
for df in df_list:
    Portfolio = ET.SubElement(
        Portfolios, 
        "Portfolio", 
        Name = df["Name"][0],
        BaseCurrency = df["BaseCurrency2"][0], 
        TradingPower = str(df["TradingPower"][0]),
        ValidationProfile = df["ValidationProfile"][0], 
        CommissionProfile = df["CommissionProfile"][0]
    )

    PortfolioPositions = ET.SubElement(Portfolio, "PortfolioPositions")

    # ROW LEVEL ITERATION
    for row in df.itertuples():
        if row.Type == "Cash":
            PortfolioPosition = ET.SubElement(
                PortfolioPositions, 
                "PortfolioPosition", 
                Type = row.Type, 
                Volume = str(row.Volume)
            )
            Cash = ET.SubElement(
                PortfolioPosition, 
                "Cash", 
                Currency = str(row.Currency)
            )
        else:
            PortfolioPosition = ET.SubElement(
                 PortfolioPositions, 
                 "PortfolioPosition", 
                 Type = row.Type,
                 Volume = str(row.Volume),
                 Invested = str(row.Invested), 
                 BaseInvested = str(row.BaseInvested)
            )
            Instrument = ET.SubElement(
                 PortfolioPosition, 
                 "Instrument", 
                 Ticker = str(row.Ticker),
                 ISIN = str(row.ISIN),
                 Market = str(row.Market),
                 Currency = str(row.Currency2),
                 CFI = str(row.CFI)
            )

# SAVE PRETTY PRINT OUTPUT
with open("Output.xml", "wb") as f:
    dom = md.parseString(ET.tostring(root))
    f.write(dom.toprettyxml().encode("utf-8"))

【讨论】:

嘿,非常感谢!如果我只有一个投资组合,这很完美。但是,当我向数据库添加第二个投资组合时,代码会中断。我怎样才能让它也迭代投资组合属性? 请发布样本数据。对于groupby,它应该适用于["Name", "BaseCurrency2", "TradingPower", "ValidationProfile", "CommissionProfile"] 的所有组合。 nan 可以缺少其中的任何一个吗?如果是这样,请在 groupby 中使用 dropna=False arg。 您好,我在帖子中添加了 excel 屏幕截图 您遇到什么错误?我用漂亮的打印修复了保存选项,这只是这个答案的问题。【参考方案2】:

在python中将excel转换为XML

import openpyxl
import xml.etree.ElementTree as ET

def convert_excel_to_xml(file_name, sheet_name):
    wb = openpyxl.load_workbook(file_name)
    sheet = wb[sheet_name]
    root = ET.Element("root")
    for row in sheet.rows:
        for cell in row:
            ET.SubElement(root, "cell", value=cell.value)
    tree = ET.ElementTree(root)
    tree.write(".xml".format(sheet_name))

运行函数

convert_excel_to_xml("test.xlsx", "Sheet1")

【讨论】:

以上是关于在python中将excel转换为XML的主要内容,如果未能解决你的问题,请参考以下文章

在 Python 中将 Excel 转换为 Yaml 语法

如何在 Python 中将 Azure Blob 文件 CSV 转换为 Excel

在 Python 中将 Excel 列类型从 Int 转换为 String

如何在python中将excel文件转换为这种格式? [复制]

在 python 中将多个 excel '.xlsx' 转换为 '.csv' 文件时,我得到了额外的列?

如何在python中将数据框转换为数组? [复制]