在python中将excel转换为XML
Posted
技术标签:
【中文标题】在python中将excel转换为XML【英文标题】:Convert excel to XML in python 【发布时间】:2021-12-25 23:57:31 【问题描述】:我正在尝试将 excel 数据库转换为 python。 我有一个交易数据需要以 xml 格式导入系统。
我的代码如下:
df = pd.read_excel("C:/Users/junag/Documents/XML/Portfolio2.xlsx", sheet_name="Sheet1", dtype=object)
root = ET.Element('trading-data')
root.set('xmlns:xsi', 'http://www.w3.org/2001/XMLSchema-instance')
tree = ET.ElementTree(root)
Portfolios = ET.SubElement(root, "Portfolios")
Defaults = ET.SubElement(Portfolios, "Defaults", BaseCurrency="USD")
for row in df.itertuples():
Portfolio = ET.SubElement(Portfolios, "Portfolio", Name=row.Name, BaseCurrency=row.BaseCurrency2, TradingPower=str(row.TradingPower),
ValidationProfile=row.ValidationProfile, CommissionProfile=row.CommissionProfile)
PortfolioPositions = ET.SubElement(Portfolio, "PortfolioPositions")
if row.Type == "Cash":
PortfolioPosition = ET.SubElement(PortfolioPositions, "PortfolioPosition", Type=row.Type, Volume=str(row.Volume))
Cash = ET.SubElement(PortfolioPosition, 'Cash', Currency=str(row.Currency))
else:
PortfolioPosition = ET.SubElement(PortfolioPositions, "PortfolioPosition", Type=row.Type, Volume=str(row.Volume),
Invested=str(row.Invested), BaseInvested=str(row.BaseInvested))
Instrument = ET.SubElement(PortfolioPosition, 'Instrument', Ticker=str(row.Ticker), ISIN=str(row.ISIN), Market=str(row.Market),
Currency=str(row.Currency2), CFI=str(row.CFI))
ET.indent(tree, space="\t", level=0)
tree.write("Portfolios_converted2.xml", encoding="utf-8")
输出如下所示: enter image description here
虽然我需要它看起来像这样: enter image description here
如何改进我的代码以使输出的 xml 看起来更好?请指教
这里是excel数据:
【问题讨论】:
请定义“看起来更好”。当前输出有什么问题? 子标签“Portfolio”属于“Portfolios”,“PortfolioPositions”属于“Portfolio”。问题在于,对于每个投资组合头寸,“Portfolio”和“PortfolioPositions”都有一个单独的开始和结束标签,而“Portfolio”应该有一个统计和结束标签,“PortfolioPositions”应该有一个开始和结束标签,以及里面应该是职位。 对于minimal reproducible example,请发布示例数据。我们无法访问您的 Excel 文件。 【参考方案1】:由于您需要单个 <Portfolio>
和 <PortfolioPositions>
作为父分组,因此请考虑通过遍历数据框列表来进行嵌套循环。然后,在每个数据框内循环遍历其行:
import xml.etree.ElementTree as ET
import pandas as pd
import xml.dom.minidom as md
df = pd.read_excel("Input.xlsx", sheet_name="Sheet1", dtype=object)
# LIST OF DATA FRAME SPLITS
df_list = [g for i,g in df.groupby(
["Name", "BaseCurrency2", "TradingPower", "ValidationProfile", "CommissionProfile"]
)]
# ROOT LEVEL
root = ET.Element('trading-data')
root.set('xmlns:xsi', 'http://www.w3.org/2001/XMLSchema-instance')
# ROOT CHILD LEVEL
Portfolios = ET.SubElement(root, "Portfolios")
Defaults = ET.SubElement(Portfolios, "Defaults", BaseCurrency="USD")
# GROUP LEVEL ITERATION
for df in df_list:
Portfolio = ET.SubElement(
Portfolios,
"Portfolio",
Name = df["Name"][0],
BaseCurrency = df["BaseCurrency2"][0],
TradingPower = str(df["TradingPower"][0]),
ValidationProfile = df["ValidationProfile"][0],
CommissionProfile = df["CommissionProfile"][0]
)
PortfolioPositions = ET.SubElement(Portfolio, "PortfolioPositions")
# ROW LEVEL ITERATION
for row in df.itertuples():
if row.Type == "Cash":
PortfolioPosition = ET.SubElement(
PortfolioPositions,
"PortfolioPosition",
Type = row.Type,
Volume = str(row.Volume)
)
Cash = ET.SubElement(
PortfolioPosition,
"Cash",
Currency = str(row.Currency)
)
else:
PortfolioPosition = ET.SubElement(
PortfolioPositions,
"PortfolioPosition",
Type = row.Type,
Volume = str(row.Volume),
Invested = str(row.Invested),
BaseInvested = str(row.BaseInvested)
)
Instrument = ET.SubElement(
PortfolioPosition,
"Instrument",
Ticker = str(row.Ticker),
ISIN = str(row.ISIN),
Market = str(row.Market),
Currency = str(row.Currency2),
CFI = str(row.CFI)
)
# SAVE PRETTY PRINT OUTPUT
with open("Output.xml", "wb") as f:
dom = md.parseString(ET.tostring(root))
f.write(dom.toprettyxml().encode("utf-8"))
【讨论】:
嘿,非常感谢!如果我只有一个投资组合,这很完美。但是,当我向数据库添加第二个投资组合时,代码会中断。我怎样才能让它也迭代投资组合属性? 请发布样本数据。对于groupby
,它应该适用于["Name", "BaseCurrency2", "TradingPower", "ValidationProfile", "CommissionProfile"]
的所有组合。 nan
可以缺少其中的任何一个吗?如果是这样,请在 groupby 中使用 dropna=False
arg。
您好,我在帖子中添加了 excel 屏幕截图
您遇到什么错误?我用漂亮的打印修复了保存选项,这只是这个答案的问题。【参考方案2】:
在python中将excel转换为XML
import openpyxl
import xml.etree.ElementTree as ET
def convert_excel_to_xml(file_name, sheet_name):
wb = openpyxl.load_workbook(file_name)
sheet = wb[sheet_name]
root = ET.Element("root")
for row in sheet.rows:
for cell in row:
ET.SubElement(root, "cell", value=cell.value)
tree = ET.ElementTree(root)
tree.write(".xml".format(sheet_name))
运行函数
convert_excel_to_xml("test.xlsx", "Sheet1")
【讨论】:
以上是关于在python中将excel转换为XML的主要内容,如果未能解决你的问题,请参考以下文章
如何在 Python 中将 Azure Blob 文件 CSV 转换为 Excel
在 Python 中将 Excel 列类型从 Int 转换为 String
如何在python中将excel文件转换为这种格式? [复制]