C# 使用 XmlReader 但不使用 XmlDocument 获取额外的空白值

Posted

技术标签:

【中文标题】C# 使用 XmlReader 但不使用 XmlDocument 获取额外的空白值【英文标题】:C# getting extra whitespace values with XmlReader but not with XmlDocument 【发布时间】:2018-02-08 05:14:45 【问题描述】:

我有一个不太了解的情况。读取以下 XML 时:

<?xml version="1.0" encoding="utf-8" ?>
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Countries>
      <Country>
        <CountryCode>CN</CountryCode>
        <CurrentStatus>Active</CurrentStatus>
      </Country>
    </Countries>

    <Countries>
      <Country>
        <CountryCode>AU</CountryCode>
        <CurrentStatus>Cancelled</CurrentStatus>
      </Country>
      <Country>
        <CountryCode>CN</CountryCode>
        <CurrentStatus>Cancelled</CurrentStatus>
      </Country>
      <Country>
        <CountryCode>US</CountryCode>
        <CurrentStatus>Active</CurrentStatus>
      </Country>
    </Countries>

    <Countries xsi:nil="true" />
</Root>

使用以下代码:

//No whitespace
string xml = File.ReadAllText(fileInfo.FullName);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xml);
string json1 = JsonConvert.SerializeXmlNode(xmlDoc);

//With whitespace
XmlDocument doc = new XmlDocument();
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;

using (XmlReader reader = XmlReader.Create(fileInfo.FullName, settings))

    while (reader.Read())
    
        if (reader.NodeType == XmlNodeType.Element)
        
            XmlNode node = doc.ReadNode(reader);
            string json2 = JsonConvert.SerializeXmlNode(node);
        
    

我得到的json 看起来像这样:

json1:

"?xml":"@version":"1.0","@encoding":"utf-8","Root":"@xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","国家":["Country":"CountryCode":"CN","CurrentStatus":"Active","Country":["CountryCode":"AU","CurrentStatus":"Cancelled" ,"CountryCode":"CN","CurrentStatus":"Cancelled","CountryCode":"JP","CurrentStatus":"Cancelled","CountryCode":"SG","CurrentStatus" :"Cancelled","CountryCode":"US","CurrentStatus":"Active"],"@xsi:nil":"true"]

json2:

"Root":"@xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","#whitespace":["\n ","\n ","\n ","\n"],"国家":["#whitespace":["\n ","\n "],"国家":"#whitespace":["\n ","\n ","\n "],"CountryCode":"CN","CurrentStatus":"Active","#whitespace":["\n ","\n ","\n ","\n ","\n ","\n "],"国家":["#whitespace":["\n ","\n ","\n "],"CountryCode":"AU","CurrentStatus":"Cancelled","#whitespace":["\n ","\n ","\n "],"CountryCode":"CN","CurrentStatus":"Cancelled","#whitespace":["\n ","\n ","\n "],"CountryCode":"JP","CurrentStatus":"Cancelled","#whitespace":["\n ","\n ","\n "],"CountryCode":"SG","CurrentStatus":"Cancelled","#whitespace":["\n ","\n ","\n "],"CountryCode":"US","CurrentStatus":"Active"],"@xsi:nil":"true"]

为什么XmlReader 会生成空白而XmlDocument 不会?考虑到 XML 值,我认为它们不应该存在。

【问题讨论】:

试试settings.IgnoreWhitespace = true;。但基本上你已经有了答案。您真的需要 Reader,即您的数据 > 100MB 吗? 我不知道为什么 XmlReader 默认会这样,但我认为您只需将 XmlReaderSettings.IgnoreWhitespace 设置为 true。 @HenkHolterman 谢谢。我需要阅读器,因为当我读取数据时我的 XML 没有根元素,XmlDocument 会抛出错误。因为我的问题是关于空格的,所以我添加了根元素以显示 XmlReaderXmlDocument 之间的区别。 你看过 XElement 吗?通常更容易使用。 好的,让我们听听忽略设置完全有帮助。 【参考方案1】:

解决了:

settings.IgnoreWhitespace = true;

感谢@HenkHolterman 和@finrod。

【讨论】:

【参考方案2】:
 XmlDocument doc = new XmlDocument();
        doc.PreserveWhitespace = false;
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.ConformanceLevel = ConformanceLevel.Document;
        settings.IgnoreWhitespace = true;
        XmlReader reader = XmlReader.Create("XMLFile1.xml", settings);
        

            while (reader.Read())
            
                if (reader.NodeType == XmlNodeType.Element )
                
                    XmlNode node = doc.ReadNode(reader);
                    string json2 = JsonConvert.SerializeXmlNode(node);
                    Console.WriteLine(json2.Trim());
                
            
        

【讨论】:

以上是关于C# 使用 XmlReader 但不使用 XmlDocument 获取额外的空白值的主要内容,如果未能解决你的问题,请参考以下文章

在 C# 中使用 XmlReader 读取 Xml

使用 xmlReader 在 C# 中过滤特定元素值的大型 XML

XmlReader 创建空字符串 C#

c# 操作xml之xmlReader

C# XmlReader 根据我调用阅读器方法的方式读取 XML 错误且不同

如何在 PowerShell 中使用 XmlReader 流式传输大/巨大的 XML 文件?