在 Java 中通过 XSLT 进行 XML 粉碎
Posted
技术标签:
【中文标题】在 Java 中通过 XSLT 进行 XML 粉碎【英文标题】:XML shredding via XSLT in Java 【发布时间】:2012-01-22 19:41:03 【问题描述】:我需要转换具有表单嵌套(分层)结构的大型 XML 文件
<Root>
Flat XML
Hierarchical XML (multiple blocks, some repetitive)
Flat XML
</Root>
变成更扁平(“切碎”)的形式,每个重复嵌套块有 1 个块。
数据有许多不同的标签和层次结构变化(尤其是在分层 XML 之前和之后的碎 XML 的标签数量方面),因此理想情况下不应假设标签和属性名称或层次结构级别。
只有 4 个级别的层次结构的***视图看起来像
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>A</Level 4>
<Level 4>B</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
然后所需的输出将是
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>A</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>B</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
也就是说,如果在每个级别i
有Li
不同的组件,一共会产生Product(Li)
不同的组件(上面只有2个,因为唯一的区别因素是级别4,所以L1*L2*L3*L4 = 2
)。
据我所见,XSLT 可能是可行的方法,但任何其他解决方案(例如,StAX 甚至 JDOM)都可以。
使用虚构信息的更详细的示例是
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
上述数据应该被分解成 5 个块(即,每个不同的 <Job>
块对应一个块),每个块都将保持所有其他标签相同,并且只有一个 <Job>
元素。因此,鉴于上述示例中的 5 个不同的 <Job>
块,转换后(“粉碎”)的 XML 将是
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
【问题讨论】:
XSLT 非常适合这种情况;只是为了进一步理解这个问题,您想为每个给定以下 XML:
<?xml version="1.0" encoding="utf-8" ?>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
<Job title = "Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
<Job title = "Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
</Employment>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title = "Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
<Job title = "Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
以下 XSLT:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<Output>
<xsl:apply-templates select="//Employee/EmploymentHistory/Employment/JobDetails/Job" />
</Output>
</xsl:template>
<xsl:template match="//Employee/EmploymentHistory/Employment/JobDetails/Job">
<Employee>
<xsl:attribute name="name">
<xsl:value-of select="ancestor::Employee/@name"/>
</xsl:attribute>
<Address>
<xsl:value-of select="ancestor::Employee/Address"/>
</Address>
<Age>
<xsl:value-of select="ancestor::Employee/Age"/>
</Age>
<EmploymentHistory>
<Employment>
<xsl:attribute name="country">
<xsl:value-of select="ancestor::Employment/@country"/>
</xsl:attribute>
<Comment>
<xsl:value-of select="ancestor::Employment/Comment"/>
</Comment>
<Jobs>
<xsl:value-of select="ancestor::Employment/Jobs"/>
</Jobs>
<JobDetails>
<xsl:copy-of select="."/>
</JobDetails>
<Available>
<xsl:value-of select="ancestor::Employee/Available"/>
</Available>
<Experience>
<xsl:attribute name="unit">
<xsl:value-of select="ancestor::Employee/Experience/@unit"/>
</xsl:attribute>
<xsl:value-of select="ancestor::Employee/Experience"/>
</Experience>
</Employment>
</EmploymentHistory>
</Employee>
</xsl:template>
</xsl:stylesheet>
给出以下输出:
<?xml version="1.0" encoding="utf-8"?>
<Output>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title="Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title="Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title="Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title="Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title="Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
</Output>
请注意,我添加了一个 Output 根元素以确保文档格式正确。
这是你想要的吗?
您也许还可以使用 xsl:copy 来复制更高级别的元素,但我需要更多地考虑这一点。使用上面的 xslt,你有更多的控制权,但你也必须重新定义你的元素......
【讨论】:
如果可能的话,我实际上正在寻找通用 XSLT 代码(如***.com/questions/1900184/… 的答案)。但是,即使无法做到这一点,我仍然感谢您的帮助(并且现在进行测试)! :-) XSLT 的问题在于它所做的正是 Dimitre 所说的——它正在扁平化层次结构。您真正想要做的是重复所有 JobDetails 祖先,但排除所有兄弟 JobDetails。另一个问题是就业国家=“”子树 - 这使得确定您不想保留哪些元素变得更加困难。如果这有意义? 展平过程是非常确定性的:重复的每个元素(在本例中为这里是一个通用的解决方案:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pLeafNodes" select="//Level-4"/>
<xsl:template match="/">
<t>
<xsl:call-template name="StructRepro"/>
</t>
</xsl:template>
<xsl:template name="StructRepro">
<xsl:param name="pLeaves" select="$pLeafNodes"/>
<xsl:for-each select="$pLeaves">
<xsl:apply-templates mode="build" select="/*">
<xsl:with-param name="pChild" select="."/>
<xsl:with-param name="pLeaves" select="$pLeaves"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template mode="build" match="node()|@*">
<xsl:param name="pChild"/>
<xsl:param name="pLeaves"/>
<xsl:copy>
<xsl:apply-templates mode="build" select="@*"/>
<xsl:variable name="vLeafChild" select=
"*[count(.|$pChild) = count($pChild)]"/>
<xsl:choose>
<xsl:when test="$vLeafChild">
<xsl:apply-templates mode="build"
select="$vLeafChild
|
node()[not(count(.|$pLeaves) = count($pLeaves))]">
<xsl:with-param name="pChild" select="$pChild"/>
<xsl:with-param name="pLeaves" select="$pLeaves"/>
</xsl:apply-templates>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates mode="build" select=
"node()[not(.//*[count(.|$pLeaves) = count($pLeaves)])
or
.//*[count(.|$pChild) = count($pChild)]
]
">
<xsl:with-param name="pChild" select="$pChild"/>
<xsl:with-param name="pLeaves" select="$pLeaves"/>
</xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
应用于提供的简化(通用)XML 文档时:
<Level-1>
...
<Level-2>
...
<Level-3>
...
<Level-4>A</Level-4>
<Level-4>B</Level-4>
...
</Level-3>
...
</Level-2>
...
</Level-1>
产生想要的正确结果:
<Level-1>
...
<Level-2>
...
<Level-3>
<Level-4>A</Level-4>
</Level-3>
...
</Level-2>
...
</Level-1>
<Level-1>
...
<Level-2>
...
<Level-3>
<Level-4>B</Level-4>
</Level-3>
...
</Level-2>
...
</Level-1>
现在,如果我们换行:
<xsl:param name="pLeafNodes" select="//Level-4"/>
收件人:
<xsl:param name="pLeafNodes" select="//Job"/>
并将转换应用于Employee
XML 文档:
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
我们再次得到想要的正确结果:
<t>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title="Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title="Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title="Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title="Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title="Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
</t>
说明:处理在命名模板 (StructRepro
) 中完成,并由名为 pLeafNodes
的单个外部参数控制,该参数必须包含“向上结构”的所有节点的节点集将在结果中重现。
【讨论】:
优秀简洁!有什么方法可以完全避免放下“Level-4”标签吗?这将使其 100% 通用。 :-) @PNS:是的,我们可以将这段代码定义为模板(命名或模式),它使用的参数可以是节点集,包含所有必须处理为“叶子:节点”的节点. 我也在试图找到一个更好的词来描述这个过程。“Sred”显然与这里发生的事情相反(也是我这么长时间困惑的根源)。事实上,我们所拥有的是“结构再现” - - 类似于细胞分裂繁殖。 @PNS:我编辑了我的答案以提供您正在寻找的完全通用的解决方案。享受。 :) @PNS:你是说你没有得到和我一样的输出吗?如果是这样,原因可能是您修改了源 XML 或您修改了 XSLT 代码,或者您的 XSLT 处理器可能有问题。我已经使用许多不同的 XSLT 处理器运行了这个转换,并且通过所有这些处理器,我得到了答案中给出的结果。 @PNS:使用以下所有 XSLT 处理器,我得到(您或其他任何人都可以运行转换并确认这一点)相同的结果:MSXML3、MSXML4、MSXML6、.NET XslCompiledTransform、.NET XslTransform,撒克逊 6.5.4,撒克逊 9.1.07,XQSharp。以上是关于在 Java 中通过 XSLT 进行 XML 粉碎的主要内容,如果未能解决你的问题,请参考以下文章
在 iOS 中通过 GCDAsyncsocket 读取 xml 文件