转换分层 DOM 树 XML 文件:XSLT 能否转换为“扁平化”XML 文件而不会丢失数据?
Posted
技术标签:
【中文标题】转换分层 DOM 树 XML 文件:XSLT 能否转换为“扁平化”XML 文件而不会丢失数据?【英文标题】:Converting Hierarchical DOM-Tree XML Files: Can XSLT convert to a "flattened" XML file without losing data? 【发布时间】:2021-09-29 02:18:50 【问题描述】:我正在处理每个月来的一批 XML 文件。它们都遵循相同的 DOM 树结构,并且不附带模式文件。这是一个示例:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<REPORT><REPORT-DTL><REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END></REPORT-DTL>
<GROUP><GROUP-DTL><GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME></GROUP-DTL>
<PROVIDER><PROVIDER-DTL><PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME></PROVIDER-DTL>
<PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientANumber1234</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientALastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientAFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1941-02-11</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX></PATIENT-DTL>
<SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientAServiceDate2020-09-07</SERVICE-DATE><SERVICE-CODE>PatientAServiceCodeABC1</SERVICE-CODE><SERVICE-DESCRIPTION>PatientAServiceDescription-Facelift</SERVICE-DESCRIPTION><SERVICE-AMT>PatientAServiceAmount8.90</SERVICE-AMT></SERVICE-DTL1>
</PATIENT>
<PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientBNumber1235</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientBLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientBFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1955-10-11</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX></PATIENT-DTL>
<SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientBServiceDate2020-12-08</SERVICE-DATE><SERVICE-CODE>PatientBServiceCodeABC2</SERVICE-CODE><SERVICE-DESCRIPTION>PatientBServiceDescription-Checkup</SERVICE-DESCRIPTION><SERVICE-AMT>PatientBServiceAmount10.50</SERVICE-AMT></SERVICE-DTL1>
</PATIENT>
<PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientCNumber1236</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientCLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientCFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1965-02-07</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX></PATIENT-DTL>
<SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientCServiceDate2020-11-11</SERVICE-DATE><SERVICE-CODE>PatientCServiceCodeABC3</SERVICE-CODE><SERVICE-DESCRIPTION>PatientCServiceDescription-X-Ray</SERVICE-DESCRIPTION><SERVICE-AMT>PatientCServiceAmount18.00</SERVICE-AMT></SERVICE-DTL1>
</PATIENT>
<PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientDNumber1237</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientDLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientDFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1975-07-09</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX></PATIENT-DTL>
<SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientDServiceDate2020-01-10</SERVICE-DATE><SERVICE-CODE>PatientDServiceCodeABC4</SERVICE-CODE><SERVICE-DESCRIPTION>PatientDServiceDescription-Nose Cleaning</SERVICE-DESCRIPTION><SERVICE-AMT>PatientDServiceAmount6.00</SERVICE-AMT></SERVICE-DTL1>
</PATIENT>
</PROVIDER>
</GROUP>
</REPORT>
注意代码的层次结构。所有的 PATIENT 数据“属于”父 PROVIDER 节点。每个程序“属于”取代它的患者。 XML 代码描述 these 4 个相互关联的表。
我的数据库软件无法从 XML 文件中导入这些相互关联的表——我的数据库应用程序无法遵循代码的层次结构。
相反,如果数据被“展平”,并且相互关联的节点被“解包”(并复制),我可以导入数据。 Here 是我希望我的扁平表在导入时的样子。是的,它有很多冗余——这个新表的前 12 列对于每条记录都是相同的。 (没关系,我可以稍后消除冗余;此时,我只想读入所有数据。)
下面是生成上图中扁平化表格的 XML 代码:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<REPORT>
<ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
<GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
<PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
<PATIENT-HEALTH-NUMBER>PatientANumber1234</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientALastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientAFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1941-02-11</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX>
<SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientAServiceDate2020-09-07</SERVICE-DATE><SERVICE-CODE>PatientAServiceCodeABC1</SERVICE-CODE><SERVICE-DESCRIPTION>PatientAServiceDescription-Facelift</SERVICE-DESCRIPTION><SERVICE-AMT>PatientAServiceAmount8.90</SERVICE-AMT>
</ImportedTable>
<ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
<GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
<PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
<PATIENT-HEALTH-NUMBER>PatientBNumber1235</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientBLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientBFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1955-10-11</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX>
<SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientBServiceDate2020-12-08</SERVICE-DATE><SERVICE-CODE>PatientBServiceCodeABC2</SERVICE-CODE><SERVICE-DESCRIPTION>PatientBServiceDescription-Checkup</SERVICE-DESCRIPTION><SERVICE-AMT>PatientBServiceAmount10.50</SERVICE-AMT>
</ImportedTable>
<ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
<GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
<PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
<PATIENT-HEALTH-NUMBER>PatientCNumber1236</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientCLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientCFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1965-02-07</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX>
<SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientCServiceDate2020-11-11</SERVICE-DATE><SERVICE-CODE>PatientCServiceCodeABC3</SERVICE-CODE><SERVICE-DESCRIPTION>PatientCServiceDescription-X-Ray</SERVICE-DESCRIPTION><SERVICE-AMT>PatientCServiceAmount18.00</SERVICE-AMT>
</ImportedTable>
<ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
<GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
<PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
<PATIENT-HEALTH-NUMBER>PatientDNumber1237</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientDLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientDFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1975-07-09</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX>
<SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientDServiceDate2020-01-10</SERVICE-DATE><SERVICE-CODE>PatientDServiceCodeABC4</SERVICE-CODE><SERVICE-DESCRIPTION>PatientDServiceDescription-Nose Cleaning</SERVICE-DESCRIPTION><SERVICE-AMT>PatientDServiceAmount6.00</SERVICE-AMT>
</ImportedTable>
</REPORT>
所以,我的问题是 XSLT 是否可以从顶部的 XML 文件转换到底部的文件。请注意,在转换后的 XML 文件中,我只对保留包含文本的节点感兴趣。 (原始文件中的任何非文本节点都可以在此转换中安全地忽略。)是否有代码可以进行此转换? (注意:我一直在阅读一些处理 XML 转换的线程,但由于该数据集的关系结构,这种情况并不常见。如果此问题已在其他地方得到解答,请告诉我!)
非常感谢,
罗恩
【问题讨论】:
如果您想将每个PATIENT
元素映射到ImportedTable
元素,那么在XSLT 中通过<xsl:template match="PATIENT"><ImportedTable>...</ImportedTable></xsl:template>
完成。这些点至少会做<xsl:copy-of select="descendant::*[not(*)]"/>
加上收集前面的元素。由于没有缩进,我还没有完全掌握结构。
【参考方案1】:
扁平化分层 XML 是一项常见任务。您只需为最原子表中的每条记录创建一个元素,并包含其父表中的数据。碰巧的是,您的示例 XML 输入的结构使这变得非常容易:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/REPORT">
<xsl:copy>
<xsl:variable name="report-data" select="REPORT-DTL/*" />
<xsl:for-each select="GROUP">
<xsl:variable name="group-data" select="GROUP-DTL/*" />
<xsl:for-each select="PROVIDER">
<xsl:variable name="provider-data" select="PROVIDER-DTL/*" />
<xsl:for-each select="PATIENT">
<ImportedTable>
<xsl:copy-of select="$report-data"/>
<xsl:copy-of select="$group-data"/>
<xsl:copy-of select="$provider-data"/>
<xsl:copy-of select="PATIENT-DTL/*"/>
<xsl:copy-of select="SERVICE-DTL1/*"/>
</ImportedTable>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
【讨论】:
是的!做到了!谢谢!对于将来来此线程的任何人,我将 michael.hor257k 的代码保存为 .xsl 文件 (transform.xsl) ,然后在 Visual Basic 中使用此代码保存转换后的文件: Application.TransformXML "C:\Path \To\Raw\Xml-to-convert.xml", _ "C:\Path\To\XSLT\transform.xsl", _ "C:\Path\To\Final\Converted.xml" 太棒了! 刚刚意识到我最初的例子过于简单化了。在真实文件中,一个患者在树上可以有多个他/她下的服务,我希望每个服务都显示为自己的记录。本代码仅记录属于患者的服务的第一个实例,而不是所有服务。是否可以调整代码以允许患者进行多个程序?使用您的术语,这是否意味着以上是关于转换分层 DOM 树 XML 文件:XSLT 能否转换为“扁平化”XML 文件而不会丢失数据?的主要内容,如果未能解决你的问题,请参考以下文章
使用 Python 或 XSLT 将复杂的 XML 转换为 CSV