使用 XSD、目录解析器和用于 XSLT 的 JAXP DOM 验证 XML

Posted

技术标签:

【中文标题】使用 XSD、目录解析器和用于 XSLT 的 JAXP DOM 验证 XML【英文标题】:Validate XML using XSD, a Catalog Resolver, and JAXP DOM for XSLT 【发布时间】:2014-10-31 04:01:44 【问题描述】:

背景

使用 JDK 6 将 XML 文件加载到 DOM。 XML 文件必须针对 XSD 进行验证。 XSD 文件位置因运行环境而异。确保 XML 可以针对 XSD 进行验证,无论目录结构如何,都需要目录解析器。一旦 XML 得到验证,就可以对其进行转换。

我的理解是DocumentBuilderFactory 可用于配置此类验证。这是通过使用 DocumentBuilder 和 XMLCatalogResolver 来查找与 XML 文件关联的 XSD 文件(以及任何包含的文件)来实现的。

有关使用目录派生 XSD 验证 XML 文档的问题,包括:

JAXP - debug XSD catalog look up Java XML Schema validator with custom resource resolver fails to resolve elements Can XMLCatalog be used for schema imports? How to load XMLCatalog from classpath resources (inside a jar), reliably? XMLSchema validation with Catalog.xml file for entity resolving Resolving type definitions from imported schema in XJC fails Find items that can be repeated in an xml schema using Java Java servlets: xml validation against xsd

这些问题和答案中的大多数都引用了硬编码的 XSD 文件路径,或者使用SAX 来执行验证,或者属于DTDs,或者需要JDOM dependencies,或者没有transformation。

问题

没有规范的解决方案描述如何使用 JAXP DOM 使用 XML 目录进行 XSD 验证,随后通过 XSLT 进行转换。有一个 number 和 snippets,但没有完整的独立示例可以编译和运行(在 JDK 6 下)。

我发布的答案在技术上似乎可行,但过于冗长。

问题

验证和转换 XML 文档的规范方法是什么(使用 JDK 1.6 库)?这是一种可能的算法:

    创建目录解析器。 创建一个 XML 解析器。 将解析器与解析器相关联。 解析包含 XSD 引用的 XML 文档。 在验证错误时终止。 使用 XSL 模板转换经过验证的 XML。

【问题讨论】:

【参考方案1】:

源文件

源文件包括目录管理器属性文件、Java 源代码、目录文件、XML 数据、XSL 文件和 XSD 文件。所有文件都相对于当前工作目录 (./)。

目录管理器属性文件

此属性文件由 CatalogResolver 类读取;另存为./CatalogManager.properties:

catalogs=catalog.xml
relative-catalogs=yes
verbosity=99
prefer=system
static-catalog=yes
allow-oasis-xml-catalog-pi=yes

TestXSD.java

这是主要应用程序;保存为./src/TestXSD.java:

package src;

import java.io.*;
import java.net.URI;
import java.util.*;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

import javax.xml.parsers.*;
import javax.xml.xpath.*;
import javax.xml.XMLConstants;

import org.w3c.dom.*;
import org.xml.sax.*;

import org.apache.xml.resolver.tools.CatalogResolver;
import org.apache.xerces.util.XMLCatalogResolver;
import static org.apache.xerces.jaxp.JAXPConstants.JAXP_SCHEMA_LANGUAGE;
import static org.apache.xerces.jaxp.JAXPConstants.W3C_XML_SCHEMA;

import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Schema;
import javax.xml.validation.Validator;

import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;

import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.sax.SAXSource;

import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

/**
 * Download http://xerces.apache.org/xml-commons/components/resolver/CatalogManager.properties
 */
public class TestXSD 
  private final static String ENTITY_RESOLVER =
    "http://apache.org/xml/properties/internal/entity-resolver";

  /**
   * This program reads an XML file, performs validation, reads an XSL
   * file, transforms the input XML, and then writes the transformed document
   * to standard output.
   *
   * args[0] - The XSL file used to transform the XML file
   * args[1] - The XML file to transform using the XSL file
   */
  public static void main( String args[] ) throws Exception 
    // For validation error messages.
    ErrorHandler errorHandler = new DocumentErrorHandler(); 

    // Read the CatalogManager.properties file.
    CatalogResolver resolver = new CatalogResolver();
    XMLCatalogResolver xmlResolver = createXMLCatalogResolver( resolver );

    logDebug( "READ XML INPUT SOURCE" );
    // Load an XML document in preparation to transform it.
    InputSource xmlInput = new InputSource( new InputStreamReader(
      new FileInputStream( args[1] ) ) );

    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    dbFactory.setAttribute( JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA );
    dbFactory.setNamespaceAware( true );

    DocumentBuilder builder = dbFactory.newDocumentBuilder();
    builder.setEntityResolver( xmlResolver );
    builder.setErrorHandler( errorHandler );

    logDebug( "PARSE XML INTO DOCUMENT MODEL" );
    Document xmlDocument = builder.parse( xmlInput );

    logDebug( "CONVERT XML DOCUMENT MODEL INTO DOMSOURCE" );
    DOMSource xml = new DOMSource( xmlDocument );

    logDebug( "GET XML SCHEMA DEFINITION" );
    String schemaURI = getSchemaURI( xmlDocument );

    logDebug( "SCHEMA URI: " + schemaURI );

    if( schemaURI != null ) 
      logDebug( "CREATE SCHEMA FACTORY" );
      // Create a Schema factory to obtain a Schema for XML validation...
      SchemaFactory sFactory = SchemaFactory.newInstance( W3C_XML_SCHEMA );
      sFactory.setResourceResolver( xmlResolver );

      logDebug( "CREATE XSD INPUT SOURCE" );
      String xsdFileURI = xmlResolver.resolveURI( schemaURI );

      logDebug( "CREATE INPUT SOURCE XSD FROM: " + xsdFileURI );
      InputSource xsd = new InputSource(
        new FileInputStream( new File( new URI( xsdFileURI ) ) ) );

      logDebug( "CREATE SCHEMA OBJECT FOR XSD" );
      Schema schema = sFactory.newSchema( new SAXSource( xsd ) );

      logDebug( "CREATE VALIDATOR FOR SCHEMA" );
      Validator validator = schema.newValidator();

      logDebug( "VALIDATE XML AGAINST XSD" );
      validator.validate( xml );
    

    logDebug( "READ XSL INPUT SOURCE" );
    // Load an XSL template for transforming XML documents.
    InputSource xslInput = new InputSource( new InputStreamReader(
      new FileInputStream( args[0] ) ) );

    logDebug( "PARSE XSL INTO DOCUMENT MODEL" );
    Document xslDocument = builder.parse( xslInput );

    transform( xmlDocument, xslDocument, resolver );
    System.out.println();
  

  private static void transform(
    Document xml, Document xsl, CatalogResolver resolver ) throws Exception
  
    if( versionAtLeast( xsl, 2 ) ) 
      useXSLT2Transformer();
    

    logDebug( "CREATE TRANSFORMER FACTORY" );
    // Create the transformer used for the document.
    TransformerFactory tFactory = TransformerFactory.newInstance();
    tFactory.setURIResolver( resolver );

    logDebug( "CREATE TRANSFORMER FROM XSL" );
    Transformer transformer = tFactory.newTransformer( new DOMSource( xsl ) );

    logDebug( "CREATE RESULT OUTPUT STREAM" );
    // This enables writing the results to standard output.
    Result out = new StreamResult( new OutputStreamWriter( System.out ) );

    logDebug( "TRANSFORM THE XML AND WRITE TO STDOUT" );
    // Transform the document using a given stylesheet.
    transformer.transform( new DOMSource( xml ), out );
  

  /**
   * Answers whether the given XSL document version is greater than or
   * equal to the given required version number.
   *
   * @param xsl The XSL document to check for version compatibility.
   * @param version The version number to compare against.
   *
   * @return true iff the XSL document version is greater than or equal
   * to the version parameter.
   */
  private static boolean versionAtLeast( Document xsl, float version ) 
    Element root = xsl.getDocumentElement();
    float docVersion = Float.parseFloat( root.getAttribute( "version" ) );

    return docVersion >= version;
  

  /**
   * Enables Saxon9's XSLT2 transformer for XSLT2 files.
   */
  private static void useXSLT2Transformer() 
    System.setProperty("javax.xml.transform.TransformerFactory",
      "net.sf.saxon.TransformerFactoryImpl");
  

  /**
   * Creates an XMLCatalogResolver based on the file names found in
   * the given CatalogResolver. The resulting XMLCatalogResolver will
   * contain the absolute path to all the files known to the given
   * CatalogResolver.
   *
   * @param resolver The CatalogResolver to examine for catalog file names.
   * @return An XMLCatalogResolver instance with the same number of catalog
   * files as found in the given CatalogResolver.
   */
  private static XMLCatalogResolver createXMLCatalogResolver(
    CatalogResolver resolver ) 
    int index = 0;
    List files = resolver.getCatalog().getCatalogManager().getCatalogFiles();
    String catalogs[] = new String[ files.size() ];
    XMLCatalogResolver xmlResolver = new XMLCatalogResolver();

    for( Object file : files ) 
      catalogs[ index ] = (new File( file.toString() )).getAbsolutePath();
      index++;
    

    xmlResolver.setCatalogList( catalogs );

    return xmlResolver;
  

  private static String[] parseNameValue( String nv ) 
    Pattern p = Pattern.compile( "\\s*(\\w+)=\"([^\"]*)\"\\s*" );
    Matcher m = p.matcher( nv );
    String result[] = new String[2];

    if( m.find() ) 
      result[0] = m.group(1);
      result[1] = m.group(2);
    

    return result;
  

  /**
   * Retrieves the XML schema definition using an XSD.
   *
   * @param node The document (or child node) to traverse seeking processing
   * instruction nodes.
   * @return null if no XSD is present in the XML document.
   * @throws IOException Never thrown (uses StringReader).
   */
  private static String getSchemaURI( Node node ) throws IOException 
    String result = null;

    if( node.getNodeType() == Node.PROCESSING_INSTRUCTION_NODE ) 
      ProcessingInstruction pi = (ProcessingInstruction)node;

      logDebug( "NODE IS PROCESSING INSTRUCTION" );

      if( "xml-model".equals( pi.getNodeName() ) ) 
        logDebug( "PI IS XML MODEL" );

        // Hack to get the attributes.
        String data = pi.getData();

        if( data != null ) 
          final String attributes[] = pi.getData().trim().split( "\\s+" );

          String type = parseNameValue( attributes[0] )[1];
          String href = parseNameValue( attributes[1] )[1];

          // TODO: Schema should = http://www.w3.org/2001/XMLSchema
          //String schema = attributes.getNamedItem( "schematypens" );

          if( "application/xml".equalsIgnoreCase( type ) && href != null ) 
            result = href;
          
        
      
    
    else 
      // Try to get the schema type information.
      NamedNodeMap attrs = node.getAttributes();

      if( attrs != null ) 
        // TypeInfo.toString() returns values of the form:
        // schemaLocation="uri schemaURI"
        // The following loop extracts the schema URI.
        for( int i = 0; i < attrs.getLength(); i++ ) 
          Attr attribute = (Attr)attrs.item( i );
          TypeInfo typeInfo = attribute.getSchemaTypeInfo();
          String attr[] = parseNameValue( typeInfo.toString() );

          if( "schemaLocation".equalsIgnoreCase( attr[0] ) ) 
            result = attr[1].split( "\\s" )[1];
            break;
          
        
      

      // Look deeper for the schema URI.
      if( result == null ) 
        NodeList list = node.getChildNodes();

        for( int i = 0; i < list.getLength(); i++ ) 
          result = getSchemaURI( list.item( i ) );

          if( result != null ) 
            break;
          
        
      
    

    return result;
  

  /**
   * Writes a message to standard output.
   */
  private static void logDebug( String s ) 
    System.out.println( s );
  

错误处理程序

这是人性化错误信息的代码;另存为./src/DocumentErrorHandler.java:

package src;

import java.io.PrintStream;

import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXParseException;
import org.xml.sax.SAXException;

/**
 * Handles error messages during parsing and validating XML documents.
 */
public class DocumentErrorHandler implements ErrorHandler 
  private final static PrintStream OUTSTREAM = System.err;

  private void log( String type, SAXParseException e ) 
    OUTSTREAM.println( "SAX PARSE EXCEPTION " + type );
    OUTSTREAM.println( "  Public ID: " + e.getPublicId() );
    OUTSTREAM.println( "  System ID: " + e.getSystemId() );
    OUTSTREAM.println( "  Line     : " + e.getLineNumber() );
    OUTSTREAM.println( "  Column   : " + e.getColumnNumber() );
    OUTSTREAM.println( "  Message  : " + e.getMessage() );
  

  @Override
  public void error( SAXParseException e ) throws SAXException 
    log( "ERROR", e );
  

  @Override
  public void fatalError( SAXParseException e ) throws SAXException 
    log( "FATAL ERROR", e );
  

  @Override
  public void warning( SAXParseException e ) throws SAXException 
    log( "WARNING", e );
  

目录文件

另存为./catalog.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.1//EN" "http://www.oasis-open.org/committees/entity/release/1.1/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <!-- XSDs linked through primary catalog -->
    <!-- catalog entry for good-note1.xml -->
    <rewriteSystem 
        systemIdStartString="http://***.com/schema" 
        rewritePrefix="./ArbitraryFolder/schemas"
    />

    <!-- catalog entry for good-note2.xml, good-note3.xml, bad-note1.xml, bad-note2.xml -->
    <rewriteURI 
        uriStartString="http://***.com/2014/09/xsd" 
        rewritePrefix="./ArbitraryFolder/schemas"
    />

    <!-- add a second catalog as a further test:
         XSL will be resolved through it -->
    <nextCatalog 
        catalog="./ArbitraryFolder/catalog.xml"
    />
</catalog>

XML 数据

不同的测试用例包括在处理指令或根节点中引用的 XSD。

架构:处理指令

可以使用xml-model 处理指令 (PI) 提供架构。另存为./Tests/good-notes2.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Associating Schemas with XML documents: http://www.w3.org/TR/xml-model/ -->
<?xml-model type="application/xml" href="http://***.com/2014/09/xsd/notes/notes.xsd"?>
<note>
    <title>Shopping List</title>
    <date>2014-08-30</date>
    <body>headlight fluid, flamgrabblit, exhaust coil</body>
</note>

架构:根节点

模式可以在文档根节点的属性中提供。另存为./Tests/good-notes3.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!-- XML Schema Part 1: Structures: 
     Schema-Related Markup in Documents Being Validated: 
     http://www.w3.org/TR/xmlschema-1/#Instance_Document_Constructions -->
<note 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://***.com http://***.com/2014/09/xsd/notes/notes.xsd">
    <title>Shopping List</title>
    <date>2014-08-30</date>
    <body>Eggs, Milk, Carrots</body>
</note>

验证失败

以下内容应验证失败(日期需要连字符);另存为./Tests/bad-note1.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Associating Schemas with XML documents: http://www.w3.org/TR/xml-model/ -->
<?xml-model type="application/xml" href="http://***.com/2014/09/xsd/notes/notes.xsd"?>
<!-- FAILS SCHEMA: date is not valid; should use hyphens -->
<note>
    <title>Shopping List</title>
    <date>20140830</date>
    <body>headlight fluid, flamgrabblit, exhaust coil</body>
</note>

转型

另存为./Tests/note-to-html.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">
    <!-- is in the second catalog (../ArbitraryFolder/catalog.xml) -->
    <xsl:import href="http://***.com/2014/09/xsl/notes/notes.xsl"/>
</xsl:stylesheet>

任意文件夹

任意文件夹表示计算机上文件的路径,该路径可以位于文件系统的任何位置。这些文件的位置可能不同,例如,在生产、开发和存储库之间。

目录

将此文件另存为./ArbitraryFolder/catalog.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.1//EN" "http://www.oasis-open.org/committees/entity/release/1.1/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">

    <!-- catalog entry for all notes -->
    <rewriteURI 
        uriStartString="http://***.com/2014/09/xsl/" 
        rewritePrefix="./XSL/"/>

</catalog>

注意事项

本示例中有两个文件用于转换笔记:notes.xsl 和 note-body.xsl。第一个包括第二个。

笔记样式表

另存为./ArbitraryFolder/XSL/notes/notes.xsl

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">

    <!-- will not be in catalog (though it could be): 
         by convention, absolute path is assumed to be part of static file structure -->
    <xsl:import href="note-body.xsl"/>

    <xsl:template match="/">
        <html>
            <head>
                <title>A Note</title>
            </head>
            <body>
                <xsl:apply-templates/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="note">
        <div>
            <xsl:apply-templates select="title, date, body"/>
        </div>
    </xsl:template>
    <xsl:template match="title">
        <h1><xsl:value-of select="."/></h1>
    </xsl:template>
    <xsl:template match="date">
        <p class="date"><xsl:value-of select="."/></p>
    </xsl:template>
</xsl:stylesheet>

注释正文样式表

另存为./ArbitraryFolder/XSL/notes/note-body.xsl

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">

    <xsl:template match="body">
        <p class="notebody"><xsl:value-of select="."/></p>
    </xsl:template>

</xsl:stylesheet>

架构

最后需要的文件是模式;另存为./schemas/notes/notes.xsd:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="note">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="title" type="xs:token"/>
                <xs:element name="date" type="xs:date"/>
                <xs:element name="body" type="xs:string"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

建筑

本节详细介绍如何构建测试应用程序。

您将需要 Saxon 9(用于 XSLT2.0 文档)、Xerces、Xalan 和 Resolver API:

jaxen-1.1.6.jar
resolver.jar
saxon9he.jar
serializer.jar
xalan.jar
xercesImpl.jar
xml-apis.jar
xsltc.jar

脚本

另存为./build.sh:

#!/bin/bash
javac -d bin -cp .:lib/* src/TestXSD.java

另存为./run.sh

#!/bin/bash
java -cp .:bin:lib/* src.TestXSD Tests/note-to-html.xsl $1

编译

使用./build.sh编译代码。

运行输出

运行使用:

./run.sh filename.xml

良好的测试

测试好笔记是否通过验证:

./run.sh Tests/good-note2.xml

没有错误。

错误测试

测试坏笔记的日期没有通过验证:

./run.sh Tests/bad-note1.xml

正如预期的那样,这会产生所需的错误:

Exception in thread "main" org.xml.sax.SAXParseException; cvc-datatype-valid.1.2.1: '20140830' is not a valid value for 'date'.
    at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    at org.apache.xerces.util.ErrorHandlerWrapper.error(Unknown Source)
    at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
    at org.apache.xerces.impl.xs.XMLSchemaValidator$XSIErrorReporter.reportError(Unknown Source)
    at org.apache.xerces.impl.xs.XMLSchemaValidator.reportSchemaError(Unknown Source)
    at org.apache.xerces.impl.xs.XMLSchemaValidator.elementLocallyValidType(Unknown Source)
    at org.apache.xerces.impl.xs.XMLSchemaValidator.processElementContent(Unknown Source)
    at org.apache.xerces.impl.xs.XMLSchemaValidator.handleEndElement(Unknown Source)
    at org.apache.xerces.impl.xs.XMLSchemaValidator.endElement(Unknown Source)
    at org.apache.xerces.jaxp.validation.DOMValidatorHelper.finishNode(Unknown Source)
    at org.apache.xerces.jaxp.validation.DOMValidatorHelper.validate(Unknown Source)
    at org.apache.xerces.jaxp.validation.DOMValidatorHelper.validate(Unknown Source)
    at org.apache.xerces.jaxp.validation.ValidatorImpl.validate(Unknown Source)
    at javax.xml.validation.Validator.validate(Validator.java:124)
    at src.TestXSD.main(TestXSD.java:103)

【讨论】:

以上是关于使用 XSD、目录解析器和用于 XSLT 的 JAXP DOM 验证 XML的主要内容,如果未能解决你的问题,请参考以下文章

根据 XSD 兼容 XML 输入 -> XSD 兼容 XML 输出的 1:1 映射从 XSD 生成 XSLT 文件

使用 xslt 转换多个 xml 模式文档

使用 xslt 按升序对 xsd 格式的 XML 进行排序

如何使用 XSLT 显示 XSD 验证的 XML

基于 XSD 变化的动态 XSLT 生成

xsd:any 元素的命名空间前缀并使用 XSLT 添加命名空间前缀