提取 XML 的某些部分

Posted

技术标签:

【中文标题】提取 XML 的某些部分【英文标题】:Extract some part of an XML 【发布时间】:2017-06-26 04:12:27 【问题描述】:

我有一个 xml,我想提取其中的一部分。但我无法得到它。 如果我使用变量并将每个键都放入变量中,我可以得到那部分,但这是一个非常漫长的过程。那么有没有什么短流程呢?

下面是 XML:

<?xml version=\"1.0\" encoding=\"UTF-8\"?><xs:nml
xmlns:xs=\"http://www.netgear.com/protocol/transaction/NMLSchema-0.9\" src=\"nas\" dst=\"dpv_1461117132000\" locale=\"en-us\">
<xs:transaction ref-id=\"\" type=\"0\">
    <xs:response ref-id=\"njl_id_1941\" status=\"success\">
        <xs:result>
            <xs:get-s resource-id=\"network_link_list\" resource-type=\"network_link_collection\">
                <network_link_collection>
                    <network_link resource-id=\"eth0\">
                        <link>eth0</link>
                        <ifname>eth0</ifname>
                        <speed>1000</speed>
                        <path/>
                        <duplex>full</duplex>
                        <vlanid>0</vlanid>
                        <iptype>ipv4dhcp</iptype>
                        <ipv6type>ipv6dhcp</ipv6type>
                        <ip>0.0.0.0</ip>
                        <subnet>255.255.255.0</subnet>
                        <broadcast>0.0.0.0</broadcast>
                        <ipv6>::</ipv6>
                        <subnet6>::</subnet6>
                        <prefixlength>64</prefixlength>
                        <ipv6_link>::</ipv6_link>
                        <prefixlength_link>64</prefixlength_link>
                        <mac>6C:B0:CE:1C:CA:AE</mac>
                        <mtu>1500</mtu>
                        <router>0.0.0.0</router>
                        <router6>0.0.0.0</router6>
                        <state>down</state>
                        <dnscollection/>
                        <routecollection/>
                        <ntpcollection/>
                    </network_link>
                </network_link_collection>
            </xs:get-s>
        </xs:result>
    </xs:response>
</xs:transaction>

我想要 network link 集合中的 xml。

【问题讨论】:

使用 DOM/SAX 解析器或 Beautiful Soup 等第三方库。 我没有任何方法可以做同样的事情。我有办法找到节点/键的值。但无法在 network_link_collection 下获取整个 xml 【参考方案1】:

您可以相当轻松地创建属性键值对映射。您只需要找到您要拉出的节点即可。

NodeList nodeList = doc.getElementsByTagName("network_link").item(0).getChildNodes();

ParseResponseXML.java

import java.io.*;
import java.net.*;
import java.util.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;

public class ParseResponseXML 
    public static void main(String[] args) 
        try 
            File fXmlFile = getResourceAsFile("resources/Response.xml");
            DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(fXmlFile);

            doc.getDocumentElement().normalize(); // http://***.com/questions/13786607

            NodeList nodeList = doc.getElementsByTagName("network_link").item(0).getChildNodes();
            Map<String, String> propertyMap = nodeListToMap(nodeList);

            for (Map.Entry<String, String> entry : propertyMap.entrySet()) 
                System.out.printf("%-18s => %s%n", entry.getKey(), entry.getValue());
            
         catch (Exception e) 
            e.printStackTrace();
        
    

    private static Map<String, String> nodeListToMap(NodeList nodeList) 
        Map<String, String> result = new LinkedHashMap<String, String>();
        for (int temp = 0; temp < nodeList.getLength(); temp++) 
            Node node = nodeList.item(temp);
            if (node.getNodeType() == Node.ELEMENT_NODE) 
                Element element = (Element) node;
                result.put(element.getTagName(), element.getTextContent());
            
        
        return result;
    

    private static File getResourceAsFile(String resource) throws IOException 
        ClassLoader loader = Parse.class.getClassLoader();
        File resourceFile = null;
        if (loader instanceof URLClassLoader) 
            URLClassLoader urlClassLoader = URLClassLoader.class.cast(loader);
            URL resourceUrl = urlClassLoader.findResource(resource);
            if ("file".equals(resourceUrl.getProtocol())) 
                try 
                    URI uri = resourceUrl.toURI();
                    resourceFile = new File(uri);
                 catch (URISyntaxException e) 
                    IOException ioException = new IOException("Unable to get file through class loader: " + loader);
                    ioException.initCause(e);
                    throw ioException;
                
            
        
        if (resourceFile == null) 
            throw new IOException("Unable to get file through class loader: " + loader);
        
        return resourceFile;
    

响应.xml

确保在 XML 的末尾有 &lt;/xs:nml&gt; 结束标记。

<?xml version="1.0" encoding="UTF-8"?>
<xs:nml xmlns:xs="http://www.netgear.com/protocol/transaction/NMLSchema-0.9"
    src="nas" dst="dpv_1461117132000" locale="en-us">
    <xs:transaction ref-id="" type="0">
        <xs:response ref-id="njl_id_1941" status="success">
            <xs:result>
                <xs:get-s resource-id="network_link_list" resource-type="network_link_collection">
                    <network_link_collection>
                        <network_link resource-id="eth0">
                            <link>eth0</link>
                            <ifname>eth0</ifname>
                            <speed>1000</speed>
                            <path />
                            <duplex>full</duplex>
                            <vlanid>0</vlanid>
                            <iptype>ipv4dhcp</iptype>
                            <ipv6type>ipv6dhcp</ipv6type>
                            <ip>0.0.0.0</ip>
                            <subnet>255.255.255.0</subnet>
                            <broadcast>0.0.0.0</broadcast>
                            <ipv6>::</ipv6>
                            <subnet6>::</subnet6>
                            <prefixlength>64</prefixlength>
                            <ipv6_link>::</ipv6_link>
                            <prefixlength_link>64</prefixlength_link>
                            <mac>6C:B0:CE:1C:CA:AE</mac>
                            <mtu>1500</mtu>
                            <router>0.0.0.0</router>
                            <router6>0.0.0.0</router6>
                            <state>down</state>
                            <dnscollection />
                            <routecollection />
                            <ntpcollection />
                        </network_link>
                    </network_link_collection>
                </xs:get-s>
            </xs:result>
        </xs:response>
    </xs:transaction>
</xs:nml>

输出

link               => eth0
ifname             => eth0
speed              => 1000
path               => 
duplex             => full
vlanid             => 0
iptype             => ipv4dhcp
ipv6type           => ipv6dhcp
ip                 => 0.0.0.0
subnet             => 255.255.255.0
broadcast          => 0.0.0.0
ipv6               => ::
subnet6            => ::
prefixlength       => 64
ipv6_link          => ::
prefixlength_link  => 64
mac                => 6C:B0:CE:1C:CA:AE
mtu                => 1500
router             => 0.0.0.0
router6            => 0.0.0.0
state              => down
dnscollection      => 
routecollection    => 
ntpcollection      => 

解开 XML

如果要解包节点,可以执行以下操作。

import java.io.*;
import java.net.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
import org.xml.sax.SAXException;

public class ParseResponseXML 
    public static void main(String[] args) 
        try 
            Document inputDoc = load("resources/Response.xml");
            Document outputDoc = unwrap(inputDoc, "network_link_collection");

            write(outputDoc, "NetworkLinkCollection.xml");
         catch (Exception e) 
            e.printStackTrace();
        
    

    public static Document load(String resource) throws IOException, ParserConfigurationException, SAXException 
        File file = getResourceAsFile(resource);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        return dBuilder.parse(file);
    

    public static void write(Document doc, String filename) throws TransformerException 
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        DOMSource source = new DOMSource(doc);
        StreamResult result = new StreamResult(new File(filename));
        // StreamResult result = new StreamResult(System.out); // Output to console.
        transformer.transform(source, result);
    

    public static Document unwrap(Document doc, String tagName) throws ParserConfigurationException 
        Node node = doc.getElementsByTagName(tagName).item(0);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document result = dBuilder.newDocument();
        Node importNode = result.importNode(node, true);
        result.appendChild(importNode);
        return result;
    

    private static File getResourceAsFile(String resourceName) throws IOException 
        ClassLoader loader = ParseResponseXML.class.getClassLoader();
        File resourceFile = null;
        if (loader instanceof URLClassLoader) 
            URLClassLoader urlClassLoader = URLClassLoader.class.cast(loader);
            URL resourceUrl = urlClassLoader.findResource(resourceName);
            if ("file".equals(resourceUrl.getProtocol())) 
                try 
                    URI uri = resourceUrl.toURI();
                    resourceFile = new File(uri);
                 catch (URISyntaxException e) 
                    IOException ioException = new IOException("Unable to get file through class loader: " + loader);
                    ioException.initCause(e);
                    throw ioException;
                
            
        
        if (resourceFile == null) 
            throw new IOException("Unable to get file through class loader: " + loader);
        
        return resourceFile;
    

NetworkLinkCollection.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<network_link_collection>
    <network_link resource-id="eth0">
        <link>eth0</link>
        <ifname>eth0</ifname>
        <speed>1000</speed>
        <path />
        <duplex>full</duplex>
        <vlanid>0</vlanid>
        <iptype>ipv4dhcp</iptype>
        <ipv6type>ipv6dhcp</ipv6type>
        <ip>0.0.0.0</ip>
        <subnet>255.255.255.0</subnet>
        <broadcast>0.0.0.0</broadcast>
        <ipv6>::</ipv6>
        <subnet6>::</subnet6>
        <prefixlength>64</prefixlength>
        <ipv6_link>::</ipv6_link>
        <prefixlength_link>64</prefixlength_link>
        <mac>6C:B0:CE:1C:CA:AE</mac>
        <mtu>1500</mtu>
        <router>0.0.0.0</router>
        <router6>0.0.0.0</router6>
        <state>down</state>
        <dnscollection />
        <routecollection />
        <ntpcollection />
    </network_link>
</network_link_collection>

【讨论】:

有没有办法让它与 xml 命名空间一起工作?此方法也删除了命名空间声明。【参考方案2】:

Polywhirl 先生的反响很好!!非常感谢!! 我只想补充一点,如果你想要提取xml的一部分但不包括xml header(),就像我一样,你必须在“write”方法中添加这个:

transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

【讨论】:

以上是关于提取 XML 的某些部分的主要内容,如果未能解决你的问题,请参考以下文章

用于提取要匹配的某些部分的正则表达式

使用 PHP 提取图像上的某些部分

如何提取firebase中的所有孩子,其中某些部分等于某个字符串

无法提取所需的文本部分并从某些元素中删除其余部分

如何仅从 xml 文件中提取特定部分并合并它们?

当标签的url部分时如何提取Oracle XML