在 XDocument.SetAttributeValue 中写入一个巨大的字符串

Posted

技术标签:

【中文标题】在 XDocument.SetAttributeValue 中写入一个巨大的字符串【英文标题】:Write a huge string in XDocument.SetAttributeValue 【发布时间】:2021-10-26 12:18:17 【问题描述】:

我有一个大的StringBuilder (~140MB),我需要在 XML 属性中写入。我正在使用XDocument 来处理 XML 操作。

当尝试将string 写入XAttribute 时,我得到了System.OutOfMemoryException(因为我需要调用StringBuilder.ToString(),我想它会将整个字符串加载到内存中)。

var length = value.RawArtifact.Content.Length;
StringBuilder b = new StringBuilder();
int pos = 0;
while (pos < length - 1000)

    b.Append(BitConverter.ToString(value.RawArtifact.Content, pos, 1000).Replace("-", ""));
    pos += 1000;

b.Append(BitConverter.ToString(value.RawArtifact.Content, pos)).Replace("-", "");
var buffer = b.ToString(); // This throws an exception
myAttribute.SetAttributeValue("my-attribute", buffer);

我找不到SetAttributeValue 的任何重载,它会占用StreamReader 之类的东西,所以我现在感觉有点卡住了。

有什么建议吗?

【问题讨论】:

不回答您的问题,但将 140MB 的字符串写入单个属性听起来是个坏主意。即使拥有这种大小的 XML 对我来说听起来也不好。 这是个糟糕的主意。我知道 .. 但不幸的是,这个决定不是由我决定的 .. 然而,最终结果仍然理论上符合 XML,所以除了它所代表的可恶大小之外,我不做那么不寻常的事情。 如果您检查reference source for XAttribute,您将看到XAttribute 具有internal string value;,因此无法使用StringBuilderStreamReader 作为值。 这并不能准确回答您的问题,但如果您要使用 XmlWriter 编写 XML,您可以使用 XmlWriter.WriteChars() 将属性值写入块中。有关演示,请参阅 dotnetfiddle.net/InBShg。 谢谢,这确实是个好主意!我必须要有创意,因为我要注入我的 do-everything-related-to-XML 类,但除非有人更快,否则我会发布我想出的答案。 【参考方案1】:

如果您检查reference source for XAttribute,您将看到XAttribute 具有internal string value;,因此无法使用StringBuilderStreamReader 作为值。

相反,您可以考虑一种流式方法,在您写出XDocument 时将所需属性注入XML 流。如果这样做,您可以将XmlWriter.WriteStartAttribute()XmlWriter.WriteChars() 结合起来,以块的形式写入您的巨大属性值。 WriteChars() 方法:

可用于一次写入一个缓冲区的大量文本。

so 正是针对这种情况而设计的。有两种基本方法可以实现属性值的流式注入:

    使用 Mark Fussell 的 Combining the XmlReader and XmlWriter classes for simple streaming transformations 中的算法,并在从 XDocument.CreateReader() 返回的 XmlReader 流式传输到 XmlWriter 时注入属性。

    有关一些示例,请参阅 File size restriction or limitation in C#Edit a large XML fileAutomating replacing tables from external files

    子类XmlWriter 本身并在编写目标元素时注入属性。

    例如,请参阅 Custom xmlWriter to skip a certain element?

采用第二种方式,首先创建如下扩展方法:

public static partial class XmlExtensions

    public static void WriteAttribute(this XmlWriter writer, string localName, IEnumerable<(char [] Buffer, int Length)> valueSegments) =>
        WriteAttribute(writer, null, localName, null, valueSegments);
        
    public static void WriteAttribute(this XmlWriter writer, string localName, string namespaceUri, IEnumerable<(char [] Buffer, int Length)> valueSegments) =>
        WriteAttribute(writer, null, localName, namespaceUri, valueSegments);
    
    public static void WriteAttribute(this XmlWriter writer, string prefix, string localName, string namespaceUri, IEnumerable<(char [] Buffer, int Length)> valueSegments)
    
        writer.WriteStartAttribute(prefix, localName, namespaceUri);
        char [] surrogateBuffer = null;

        // According to the docs, surrogate pairs cannot be split across calls to WriteChars():
        // https://docs.microsoft.com/en-us/dotnet/api/system.xml.xmlwriter.writechars?view=net-5.0#remarks
        // So if the last character of a segment is a high surrogate, buffer it and write it with the first character of the next buffer.
        foreach (var segment in valueSegments)
        
            if (segment.Length < 1)
                continue;
            int start = 0;
            if (surrogateBuffer != null && surrogateBuffer[0] != '\0')
            
                surrogateBuffer[1] = segment.Buffer[start++];
                writer.WriteChars(surrogateBuffer, 0, 2);
                surrogateBuffer[0] = surrogateBuffer[1] = '\0';
            
            int count = segment.Length - start;
            if (count > 0 && char.IsHighSurrogate(segment.Buffer[segment.Length-1]))
            
                (surrogateBuffer = surrogateBuffer ?? new char[2])[0] = segment.Buffer[segment.Length-1];
                count--;
            
            writer.WriteChars(segment.Buffer, start, count);
        
        writer.WriteEndAttribute();
        if (surrogateBuffer != null && surrogateBuffer[0] != '\0')
            throw new XmlException(string.Format("Unterminated surrogate pair 0", surrogateBuffer[0]));
    


public static class ByteExtensions

    // Copied from this answer https://***.com/a/14333437
    // By https://***.com/users/445517/codesinchaos
    // To https://***.com/questions/311165/how-do-you-convert-a-byte-array-to-a-hexadecimal-string-and-vice-versa
    // And modified to populate a char span rather than return a string.
    public static void ByteToHexBitFiddle(ReadOnlySpan<byte> bytes, Span<char> c)
    
        if (c.Length < 2* bytes.Length)
            throw new ArgumentException("c.Length < 2* bytes.Length");
        int b;
        for (int i = 0; i < bytes.Length; i++) 
            b = bytes[i] >> 4;
            c[i * 2] = (char)(55 + b + (((b-10)>>31)&-7));
            b = bytes[i] & 0xF;
            c[i * 2 + 1] = (char)(55 + b + (((b-10)>>31)&-7));
        
    
    
    public static IEnumerable<(char [] segment, int length)> GetHexCharSegments(ReadOnlyMemory<byte> bytes, int chunkSize = 1000)
    
        var buffer = new char[2*chunkSize];
        var length = bytes.Length;
        int pos = 0;
        while (pos < length - chunkSize)
        
            ByteExtensions.ByteToHexBitFiddle(bytes.Span.Slice(pos, chunkSize), buffer);
            yield return (buffer, buffer.Length);
            pos += chunkSize;
        
        ByteExtensions.ByteToHexBitFiddle(bytes.Span.Slice(pos), buffer);
        yield return (buffer, 2*(length - pos));
    

接下来,子类XmlWriter如下:

public class ElementEventArgs : EventArgs

    public XName Element  get; init; 
    public Stack<XName> ElementStack  get; init; 


public class NotifyingXmlWriter : XmlWriterProxy

    readonly Stack<XName> elements = new Stack<XName>();

    public NotifyingXmlWriter(XmlWriter baseWriter) : base(baseWriter)  

    public event EventHandler<ElementEventArgs> OnElementStarted;
    public event EventHandler<ElementEventArgs> OnElementEnded;

    public override void WriteStartElement(string prefix, string localName, string ns)
    
        base.WriteStartElement(prefix, localName, ns);
        var name = XName.Get(localName, ns);
        elements.Push(name);
        OnElementStarted?.Invoke(this, new ElementEventArgs  Element = name, ElementStack = elements );
    

    public override void WriteEndElement()
    
        base.WriteEndElement();
        var name = elements.Pop(); // Pop after base.WriteEndElement() lets the base class throw an exception on a stack error.
        OnElementEnded?.Invoke(this, new ElementEventArgs  Element = name, ElementStack = elements );
    


public class XmlWriterProxy : XmlWriter

    // Taken from this answer https://***.com/a/32150990/3744182
    // by https://***.com/users/3744182/dbc
    // To https://***.com/questions/32149676/custom-xmlwriter-to-skip-a-certain-element
    // NOTE: async methods not implemented
    readonly XmlWriter baseWriter;

    public XmlWriterProxy(XmlWriter baseWriter) => this.baseWriter = baseWriter ?? throw new ArgumentNullException();

    protected virtual bool IsSuspended  get  return false;  

    public override void Close() => baseWriter.Close();

    public override void Flush() => baseWriter.Flush();

    public override string LookupPrefix(string ns) => baseWriter.LookupPrefix(ns);

    public override void WriteBase64(byte[] buffer, int index, int count)
    
        if (IsSuspended)
            return;
        baseWriter.WriteBase64(buffer, index, count);
    

    public override void WriteCData(string text)
    
        if (IsSuspended)
            return;
        baseWriter.WriteCData(text);
    

    public override void WriteCharEntity(char ch)
    
        if (IsSuspended)
            return;
        baseWriter.WriteCharEntity(ch);
    

    public override void WriteChars(char[] buffer, int index, int count)
    
        if (IsSuspended)
            return;
        baseWriter.WriteChars(buffer, index, count);
    

    public override void WriteComment(string text)
    
        if (IsSuspended)
            return;
        baseWriter.WriteComment(text);
    

    public override void WriteDocType(string name, string pubid, string sysid, string subset)
    
        if (IsSuspended)
            return;
        baseWriter.WriteDocType(name, pubid, sysid, subset);
    

    public override void WriteEndAttribute()
    
        if (IsSuspended)
            return;
        baseWriter.WriteEndAttribute();
    

    public override void WriteEndDocument()
    
        if (IsSuspended)
            return;
        baseWriter.WriteEndDocument();
    

    public override void WriteEndElement()
    
        if (IsSuspended)
            return;
        baseWriter.WriteEndElement();
    

    public override void WriteEntityRef(string name)
    
        if (IsSuspended)
            return;
        baseWriter.WriteEntityRef(name);
    

    public override void WriteFullEndElement()
    
        if (IsSuspended)
            return;
        baseWriter.WriteFullEndElement();
    

    public override void WriteProcessingInstruction(string name, string text)
    
        if (IsSuspended)
            return;
        baseWriter.WriteProcessingInstruction(name, text);
    

    public override void WriteRaw(string data)
    
        if (IsSuspended)
            return;
        baseWriter.WriteRaw(data);
    

    public override void WriteRaw(char[] buffer, int index, int count)
    
        if (IsSuspended)
            return;
        baseWriter.WriteRaw(buffer, index, count);
    

    public override void WriteStartAttribute(string prefix, string localName, string ns)
    
        if (IsSuspended)
            return;
        baseWriter.WriteStartAttribute(prefix, localName, ns);
    

    public override void WriteStartDocument(bool standalone) => baseWriter.WriteStartDocument(standalone);

    public override void WriteStartDocument() => baseWriter.WriteStartDocument();

    public override void WriteStartElement(string prefix, string localName, string ns)
    
        if (IsSuspended)
            return;
        baseWriter.WriteStartElement(prefix, localName, ns);
    

    public override WriteState WriteState => baseWriter.WriteState;

    public override void WriteString(string text)
    
        if (IsSuspended)
            return;
        baseWriter.WriteString(text);
    

    public override void WriteSurrogateCharEntity(char lowChar, char highChar)
    
        if (IsSuspended)
            return;
        baseWriter.WriteSurrogateCharEntity(lowChar, highChar);
    

    public override void WriteWhitespace(string ws)
    
        if (IsSuspended)
            return;
        baseWriter.WriteWhitespace(ws);
    
   

现在您将能够执行以下操作:

string fileName = @"Question68941254.xml"; // or whatever

XNamespace targetNamespace = "";
XName targetName = targetNamespace + "TheNode";

using (var textWriter = new StreamWriter(fileName))
using (var innerXmlWriter = XmlWriter.Create(textWriter, new XmlWriterSettings  Indent = true ))
using (var xmlWriter = new NotifyingXmlWriter(innerXmlWriter))

    xmlWriter.OnElementStarted += (o, e) =>
    
        if (e.Element == targetName)
        
            // Add the attribute with the byte hex value to the target element.
            ((XmlWriter)o).WriteAttribute("TheAttribute", ByteExtensions.GetHexCharSegments(value.RawArtifact.Content.AsMemory()));
        
    ;
    xdocument.WriteTo(xmlWriter);

xdocument 当然是您尝试填充的一些XDocument,并将属性TheAttribute 添加到节点TheNode

注意事项:

由于您的代码显示您正在通过将大型字节数组转换为大型十六进制字符串缓冲区来填充StringBuilder,因此我消除了中间StringBuilder 并直接以块的形式写入字节数组。

如果你确实需要将一些StringBuilder b的内容分块写入,请使用

public static partial class StringBuilderExtensions

    public static IEnumerable<(char [] segment, int length)> GetSegments(this StringBuilder sb, int bufferSize = 1024)
    
        var buffer = new char[bufferSize];
        for (int i = 0; i < sb.Length; i += buffer.Length)
        
            int length = Math.Min(buffer.Length, sb.Length - i);
            sb.CopyTo(i, buffer, length);
            yield return (buffer, length);
        
    

并将b.GetSegments() 传递给XmlExtensions.WriteAttribute()

Demo fiddle here 得到结果:

<?xml version="1.0" encoding="utf-8"?>
<Root>
  <SomeOtherNode>some value</SomeOtherNode>
  <TheNode TheAttribute="000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B">
    <foo></foo>the node value</TheNode>
  <AnotherNode>another value</AnotherNode>
</Root>

【讨论】:

以上是关于在 XDocument.SetAttributeValue 中写入一个巨大的字符串的主要内容,如果未能解决你的问题,请参考以下文章

秋的潇洒在啥?在啥在啥?

上传的数据在云端的怎么查看,保存在啥位置?

在 React 应用程序中在哪里转换数据 - 在 Express 中还是在前端使用 React?

存储在 plist 中的数据在模拟器中有效,但在设备中无效

如何在保存在 Mongoose (ExpressJS) 之前在模型中格式化数据

如何在保存在 Mongoose (ExpressJS) 之前在模型中格式化数据