如何在 C# 中使用迭代器反向读取文本文件

Posted

技术标签:

【中文标题】如何在 C# 中使用迭代器反向读取文本文件【英文标题】:How to read a text file reversely with iterator in C# 【发布时间】:2010-10-01 23:03:28 【问题描述】:

我需要处理一个大文件,大约 400K 行和 200 M。但有时我必须自下而上处理。我如何在这里使用迭代器(收益回报)?基本上我不喜欢将所有内容都加载到内存中。我知道在 .NET 中使用迭代器效率更高。

【问题讨论】:

另见:Get last 10 lines of very large text file > 10GB c# 一种可能性是从末尾读取足够多的内容,然后使用 String.LastIndexOf 向后搜索“\r\n”。 查看我在重复***.com/questions/398378/…的评论 【参考方案1】:

除非您使用固定大小的编码(例如 ASCII),否则向后读取文本文件非常棘手。当您使用可变大小编码(例如 UTF-8)时,您将不得不在获取数据时检查您是否在字符中间。

框架中没有内置任何内容,我怀疑您必须为每个可变宽度编码进行单独的硬编码。

编辑:这已经有些测试了——但这并不是说它仍然没有一些微妙的错误。它使用来自 MiscUtil 的 StreamUtil,但我在底部只包含了必要的(新)方法。哦,它需要重构 - 正如您将看到的,有一种非常重要的方法:

using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text;

namespace MiscUtil.IO

    /// <summary>
    /// Takes an encoding (defaulting to UTF-8) and a function which produces a seekable stream
    /// (or a filename for convenience) and yields lines from the end of the stream backwards.
    /// Only single byte encodings, and UTF-8 and Unicode, are supported. The stream
    /// returned by the function must be seekable.
    /// </summary>
    public sealed class ReverseLineReader : IEnumerable<string>
    
        /// <summary>
        /// Buffer size to use by default. Classes with internal access can specify
        /// a different buffer size - this is useful for testing.
        /// </summary>
        private const int DefaultBufferSize = 4096;

        /// <summary>
        /// Means of creating a Stream to read from.
        /// </summary>
        private readonly Func<Stream> streamSource;

        /// <summary>
        /// Encoding to use when converting bytes to text
        /// </summary>
        private readonly Encoding encoding;

        /// <summary>
        /// Size of buffer (in bytes) to read each time we read from the
        /// stream. This must be at least as big as the maximum number of
        /// bytes for a single character.
        /// </summary>
        private readonly int bufferSize;

        /// <summary>
        /// Function which, when given a position within a file and a byte, states whether
        /// or not the byte represents the start of a character.
        /// </summary>
        private Func<long,byte,bool> characterStartDetector;

        /// <summary>
        /// Creates a LineReader from a stream source. The delegate is only
        /// called when the enumerator is fetched. UTF-8 is used to decode
        /// the stream into text.
        /// </summary>
        /// <param name="streamSource">Data source</param>
        public ReverseLineReader(Func<Stream> streamSource)
            : this(streamSource, Encoding.UTF8)
        
        

        /// <summary>
        /// Creates a LineReader from a filename. The file is only opened
        /// (or even checked for existence) when the enumerator is fetched.
        /// UTF8 is used to decode the file into text.
        /// </summary>
        /// <param name="filename">File to read from</param>
        public ReverseLineReader(string filename)
            : this(filename, Encoding.UTF8)
        
        

        /// <summary>
        /// Creates a LineReader from a filename. The file is only opened
        /// (or even checked for existence) when the enumerator is fetched.
        /// </summary>
        /// <param name="filename">File to read from</param>
        /// <param name="encoding">Encoding to use to decode the file into text</param>
        public ReverseLineReader(string filename, Encoding encoding)
            : this(() => File.OpenRead(filename), encoding)
        
        

        /// <summary>
        /// Creates a LineReader from a stream source. The delegate is only
        /// called when the enumerator is fetched.
        /// </summary>
        /// <param name="streamSource">Data source</param>
        /// <param name="encoding">Encoding to use to decode the stream into text</param>
        public ReverseLineReader(Func<Stream> streamSource, Encoding encoding)
            : this(streamSource, encoding, DefaultBufferSize)
        
        

        internal ReverseLineReader(Func<Stream> streamSource, Encoding encoding, int bufferSize)
        
            this.streamSource = streamSource;
            this.encoding = encoding;
            this.bufferSize = bufferSize;
            if (encoding.IsSingleByte)
            
                // For a single byte encoding, every byte is the start (and end) of a character
                characterStartDetector = (pos, data) => true;
            
            else if (encoding is UnicodeEncoding)
            
                // For UTF-16, even-numbered positions are the start of a character.
                // TODO: This assumes no surrogate pairs. More work required
                // to handle that.
                characterStartDetector = (pos, data) => (pos & 1) == 0;
            
            else if (encoding is UTF8Encoding)
            
                // For UTF-8, bytes with the top bit clear or the second bit set are the start of a character
                // See http://www.cl.cam.ac.uk/~mgk25/unicode.html
                characterStartDetector = (pos, data) => (data & 0x80) == 0 || (data & 0x40) != 0;
            
            else
            
                throw new ArgumentException("Only single byte, UTF-8 and Unicode encodings are permitted");
            
        

        /// <summary>
        /// Returns the enumerator reading strings backwards. If this method discovers that
        /// the returned stream is either unreadable or unseekable, a NotSupportedException is thrown.
        /// </summary>
        public IEnumerator<string> GetEnumerator()
        
            Stream stream = streamSource();
            if (!stream.CanSeek)
            
                stream.Dispose();
                throw new NotSupportedException("Unable to seek within stream");
            
            if (!stream.CanRead)
            
                stream.Dispose();
                throw new NotSupportedException("Unable to read within stream");
            
            return GetEnumeratorImpl(stream);
        

        private IEnumerator<string> GetEnumeratorImpl(Stream stream)
        
            try
            
                long position = stream.Length;

                if (encoding is UnicodeEncoding && (position & 1) != 0)
                
                    throw new InvalidDataException("UTF-16 encoding provided, but stream has odd length.");
                

                // Allow up to two bytes for data from the start of the previous
                // read which didn't quite make it as full characters
                byte[] buffer = new byte[bufferSize + 2];
                char[] charBuffer = new char[encoding.GetMaxCharCount(buffer.Length)];
                int leftOverData = 0;
                String previousEnd = null;
                // TextReader doesn't return an empty string if there's line break at the end
                // of the data. Therefore we don't return an empty string if it's our *first*
                // return.
                bool firstYield = true;

                // A line-feed at the start of the previous buffer means we need to swallow
                // the carriage-return at the end of this buffer - hence this needs declaring
                // way up here!
                bool swallowCarriageReturn = false;

                while (position > 0)
                
                    int bytesToRead = Math.Min(position > int.MaxValue ? bufferSize : (int)position, bufferSize);

                    position -= bytesToRead;
                    stream.Position = position;
                    StreamUtil.ReadExactly(stream, buffer, bytesToRead);
                    // If we haven't read a full buffer, but we had bytes left
                    // over from before, copy them to the end of the buffer
                    if (leftOverData > 0 && bytesToRead != bufferSize)
                    
                        // Buffer.BlockCopy doesn't document its behaviour with respect
                        // to overlapping data: we *might* just have read 7 bytes instead of
                        // 8, and have two bytes to copy...
                        Array.Copy(buffer, bufferSize, buffer, bytesToRead, leftOverData);
                    
                    // We've now *effectively* read this much data.
                    bytesToRead += leftOverData;

                    int firstCharPosition = 0;
                    while (!characterStartDetector(position + firstCharPosition, buffer[firstCharPosition]))
                    
                        firstCharPosition++;
                        // Bad UTF-8 sequences could trigger this. For UTF-8 we should always
                        // see a valid character start in every 3 bytes, and if this is the start of the file
                        // so we've done a short read, we should have the character start
                        // somewhere in the usable buffer.
                        if (firstCharPosition == 3 || firstCharPosition == bytesToRead)
                        
                            throw new InvalidDataException("Invalid UTF-8 data");
                        
                    
                    leftOverData = firstCharPosition;

                    int charsRead = encoding.GetChars(buffer, firstCharPosition, bytesToRead - firstCharPosition, charBuffer, 0);
                    int endExclusive = charsRead;

                    for (int i = charsRead - 1; i >= 0; i--)
                    
                        char lookingAt = charBuffer[i];
                        if (swallowCarriageReturn)
                        
                            swallowCarriageReturn = false;
                            if (lookingAt == '\r')
                            
                                endExclusive--;
                                continue;
                            
                        
                        // Anything non-line-breaking, just keep looking backwards
                        if (lookingAt != '\n' && lookingAt != '\r')
                        
                            continue;
                        
                        // End of CRLF? Swallow the preceding CR
                        if (lookingAt == '\n')
                        
                            swallowCarriageReturn = true;
                        
                        int start = i + 1;
                        string bufferContents = new string(charBuffer, start, endExclusive - start);
                        endExclusive = i;
                        string stringToYield = previousEnd == null ? bufferContents : bufferContents + previousEnd;
                        if (!firstYield || stringToYield.Length != 0)
                        
                            yield return stringToYield;
                        
                        firstYield = false;
                        previousEnd = null;
                    

                    previousEnd = endExclusive == 0 ? null : (new string(charBuffer, 0, endExclusive) + previousEnd);

                    // If we didn't decode the start of the array, put it at the end for next time
                    if (leftOverData != 0)
                    
                        Buffer.BlockCopy(buffer, 0, buffer, bufferSize, leftOverData);
                    
                
                if (leftOverData != 0)
                
                    // At the start of the final buffer, we had the end of another character.
                    throw new InvalidDataException("Invalid UTF-8 data at start of stream");
                
                if (firstYield && string.IsNullOrEmpty(previousEnd))
                
                    yield break;
                
                yield return previousEnd ?? "";
            
            finally
            
                stream.Dispose();
            
        

        IEnumerator IEnumerable.GetEnumerator()
        
            return GetEnumerator();
        
    



// StreamUtil.cs:
public static class StreamUtil

    public static void ReadExactly(Stream input, byte[] buffer, int bytesToRead)
    
        int index = 0;
        while (index < bytesToRead)
        
            int read = input.Read(buffer, index, bytesToRead - index);
            if (read == 0)
            
                throw new EndOfStreamException
                    (String.Format("End of stream reached with 0 byte1 left to read.",
                                   bytesToRead - index,
                                   bytesToRead - index == 1 ? "s" : ""));
            
            index += read;
        
    

非常欢迎反馈。这很有趣:)

【讨论】:

哇!我知道它已经有三年多的历史了,但是这段代码太棒了!谢谢!! (附注:我刚刚将 File.OpenRead(filename) 更改为 File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite) 以让迭代器读取已打开的文件 @GrimaceofDespair:更多“因为那样我就不得不为继承进行设计,这在设计时间和未来的灵活性方面都增加了非常可观的成本”。通常甚至不清楚如何将继承合理地用于类型 - 最好在找到明确性之前禁止它,IMO。 @rahularyansharma:而我喜欢将问题分解为正交方面。一旦你弄清楚如何在你的情况下打开文件,我希望我的代码适合你。 如果有人希望能够与另一个进程共享文件,例如当你想读取一个由父进程打开以供写入的日志文件时,只需替换: File.OpenRead(filename) 为: new FileStream(filename, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite) @SpiritBob:恐怕这个问题太宽泛了,真的。我会首先尝试了解您感兴趣的所有编码,以及您要解决的问题。【参考方案2】:

注意:这种方法不起作用(在 EDIT 中解释)

您可以使用 File.ReadLines 来获取行迭代器

foreach (var line in File.ReadLines(@"C:\temp\ReverseRead.txt").Reverse())

    if (noNeedToReadFurther)
        break;

    // process line here
    Console.WriteLine(line);

编辑:

阅读applejacks01 的评论后,我运行了一些测试,它确实看起来像 .Reverse() 实际上加载了整个文件。

我使用File.ReadLines() 打印 40MB 文件的第一行 - 控制台应用程序的内存使用量为 5MB。然后,使用File.ReadLines().Reverse() 打印同一文件的最后一行 - 内存使用量为 95MB

结论

无论 `Reverse()' 做什么,它都不是一个好的选择来读取一个大文件的底部。

【讨论】:

我想知道对 Reverse 的调用是否确实将整个文件加载到内存中。 Enumerable的终点不是需要先建立吗?即在内部,enumerable 完全枚举文件以创建一个临时数组,然后将其反转,然后使用 yield 关键字逐个枚举,以便以相反的顺序迭代创建一个新的 Enumerable 原来的答案是错误的,但我在这里保留编辑的答案,因为它可能会阻止其他人使用这种方法。【参考方案3】:

大文件的快速解决方案:在 C# 中,使用带有 Tail 参数的 PowerShell 的 Get-Content。

using System.Management.Automation;

using (PowerShell powerShell = PowerShell.Create())

    string lastLine = powerShell.AddCommand("Get-Content")
        .AddParameter("Path", @"c:\a.txt")
        .AddParameter("Tail", 1)
        .Invoke().FirstOrDefault()?.ToString();

必需的参考:“System.Management.Automation.dll” - 可能位于“C:\Program Files (x86)\Reference Assemblies\Microsoft\WindowsPowerShell\3.0”之类的地方

使用 PowerShell 会产生少量开销,但对于大文件来说是值得的。

【讨论】:

请注意,这只会从文件中获取 last 行,而不是 iterate【参考方案4】:

要创建文件迭代器,您可以这样做:

编辑:

这是我的固定宽度反向文件阅读器的固定版本:

public static IEnumerable<string> readFile()

    using (FileStream reader = new FileStream(@"c:\test.txt",FileMode.Open,FileAccess.Read))
    
        int i=0;
        StringBuilder lineBuffer = new StringBuilder();
        int byteRead;
        while (-i < reader.Length)
        
            reader.Seek(--i, SeekOrigin.End);
            byteRead = reader.ReadByte();
            if (byteRead == 10 && lineBuffer.Length > 0)
            
                yield return Reverse(lineBuffer.ToString());
                lineBuffer.Remove(0, lineBuffer.Length);
            
            lineBuffer.Append((char)byteRead);
        
        yield return Reverse(lineBuffer.ToString());
        reader.Close();
    


public static string Reverse(string str)

    char[] arr = new char[str.Length];
    for (int i = 0; i < str.Length; i++)
        arr[i] = str[str.Length - 1 - i];
    return new string(arr);

【讨论】:

现在对于 ISO-8859-1 来说已经接近正确,但对于任何其他编码都不正确。编码使这变得非常棘手:( “接近 ISO-8859-1 的正确性”是什么意思?还缺少什么? 处理不太正确,无法匹配 "\r" "\n" 和 "\r\n",后者最终只计为一个换行符。 它也永远不会产生空行——“a\n\nb”应该产生“a”、“”、“b” mmmmmm...我只在找到'\n'(ASCII 10)时才产生lineBuffer。你说得对,我不考虑账户'\r'。【参考方案5】:

我将文件逐行放入列表中,然后使用 List.Reverse();

        StreamReader objReader = new StreamReader(filename);
        string sLine = "";
        ArrayList arrText = new ArrayList();

        while (sLine != null)
        
            sLine = objReader.ReadLine();
            if (sLine != null)
                arrText.Add(sLine);
        
        objReader.Close();


        arrText.Reverse();

        foreach (string sOutput in arrText)
        

...

【讨论】:

不是大文件的最佳解决方案,因为您需要将其完全加载到 RAM 中。并且 OP 明确指出他不想完全加载它。【参考方案6】:

我还添加了我的解决方案。在阅读了一些答案后,没有什么真正适合我的情况。 我从后面逐字节读取,直到找到 LineFeed,然后我将收集的字节作为字符串返回,不使用缓冲

用法:

var reader = new ReverseTextReader(path);
while (!reader.EndOfStream)

    Console.WriteLine(reader.ReadLine());  

实施:

public class ReverseTextReader

    private const int LineFeedLf = 10;
    private const int LineFeedCr = 13;
    private readonly Stream _stream;
    private readonly Encoding _encoding;

    public bool EndOfStream => _stream.Position == 0;

    public ReverseTextReader(Stream stream, Encoding encoding)
    
        _stream = stream;
        _encoding = encoding;
        _stream.Position = _stream.Length;
    

    public string ReadLine()
    
        if (_stream.Position == 0) return null;

        var line = new List<byte>();
        var endOfLine = false;
        while (!endOfLine)
        
            var b = _stream.ReadByteFromBehind();

            if (b == -1 || b == LineFeedLf)
            
                endOfLine = true;
             
            line.Add(Convert.ToByte(b));
        

        line.Reverse();
        return _encoding.GetString(line.ToArray());
    


public static class StreamExtensions

    public static int ReadByteFromBehind(this Stream stream)
    
        if (stream.Position == 0) return -1;

        stream.Position = stream.Position - 1;
        var value = stream.ReadByte();
        stream.Position = stream.Position - 1;
        return value;
    

【讨论】:

【参考方案7】:

您可以一次向后读取文件一个字符并缓存所有字符,直到到达回车和/或换行。

然后,您将收集的字符串反转并将其作为一条线大喊。

【讨论】:

倒着读一个字符的文件是很困难的——因为你必须能够识别一个字符的开头。这有多简单取决于编码。【参考方案8】:

这里已经有很好的答案,这里有另一个您可以使用的 LINQ 兼容类,它侧重于性能和对大文件的支持。它假定一个“\r\n”行终止符。

用法

var reader = new ReverseTextReader(@"C:\Temp\ReverseTest.txt");
while (!reader.EndOfStream)
    Console.WriteLine(reader.ReadLine());

ReverseTextReader 类

/// <summary>
/// Reads a text file backwards, line-by-line.
/// </summary>
/// <remarks>This class uses file seeking to read a text file of any size in reverse order.  This
/// is useful for needs such as reading a log file newest-entries first.</remarks>
public sealed class ReverseTextReader : IEnumerable<string>

    private const int BufferSize = 16384;   // The number of bytes read from the uderlying stream.
    private readonly Stream _stream;        // Stores the stream feeding data into this reader
    private readonly Encoding _encoding;    // Stores the encoding used to process the file
    private byte[] _leftoverBuffer;         // Stores the leftover partial line after processing a buffer
    private readonly Queue<string> _lines;  // Stores the lines parsed from the buffer

    #region Constructors

    /// <summary>
    /// Creates a reader for the specified file.
    /// </summary>
    /// <param name="filePath"></param>
    public ReverseTextReader(string filePath)
        : this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), Encoding.Default)
     

    /// <summary>
    /// Creates a reader using the specified stream.
    /// </summary>
    /// <param name="stream"></param>
    public ReverseTextReader(Stream stream)
        : this(stream, Encoding.Default)
     

    /// <summary>
    /// Creates a reader using the specified path and encoding.
    /// </summary>
    /// <param name="filePath"></param>
    /// <param name="encoding"></param>
    public ReverseTextReader(string filePath, Encoding encoding)
        : this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), encoding)
     

    /// <summary>
    /// Creates a reader using the specified stream and encoding.
    /// </summary>
    /// <param name="stream"></param>
    /// <param name="encoding"></param>
    public ReverseTextReader(Stream stream, Encoding encoding)
              
        _stream = stream;
        _encoding = encoding;
        _lines = new Queue<string>(128);            
        // The stream needs to support seeking for this to work
        if(!_stream.CanSeek)
            throw new InvalidOperationException("The specified stream needs to support seeking to be read backwards.");
        if (!_stream.CanRead)
            throw new InvalidOperationException("The specified stream needs to support reading to be read backwards.");
        // Set the current position to the end of the file
        _stream.Position = _stream.Length;
        _leftoverBuffer = new byte[0];
    

    #endregion

    #region Overrides

    /// <summary>
    /// Reads the next previous line from the underlying stream.
    /// </summary>
    /// <returns></returns>
    public string ReadLine()
    
        // Are there lines left to read? If so, return the next one
        if (_lines.Count != 0) return _lines.Dequeue();
        // Are we at the beginning of the stream? If so, we're done
        if (_stream.Position == 0) return null;

        #region Read and Process the Next Chunk

        // Remember the current position
        var currentPosition = _stream.Position;
        var newPosition = currentPosition - BufferSize;
        // Are we before the beginning of the stream?
        if (newPosition < 0) newPosition = 0;
        // Calculate the buffer size to read
        var count = (int)(currentPosition - newPosition);
        // Set the new position
        _stream.Position = newPosition;
        // Make a new buffer but append the previous leftovers
        var buffer = new byte[count + _leftoverBuffer.Length];
        // Read the next buffer
        _stream.Read(buffer, 0, count);
        // Move the position of the stream back
        _stream.Position = newPosition;
        // And copy in the leftovers from the last buffer
        if (_leftoverBuffer.Length != 0)
            Array.Copy(_leftoverBuffer, 0, buffer, count, _leftoverBuffer.Length);
        // Look for CrLf delimiters
        var end = buffer.Length - 1;
        var start = buffer.Length - 2;
        // Search backwards for a line feed
        while (start >= 0)
        
            // Is it a line feed?
            if (buffer[start] == 10)
            
                // Yes.  Extract a line and queue it (but exclude the \r\n)
                _lines.Enqueue(_encoding.GetString(buffer, start + 1, end - start - 2));
                // And reset the end
                end = start;
            
            // Move to the previous character
            start--;
        
        // What's left over is a portion of a line. Save it for later.
        _leftoverBuffer = new byte[end + 1];
        Array.Copy(buffer, 0, _leftoverBuffer, 0, end + 1);
        // Are we at the beginning of the stream?
        if (_stream.Position == 0)
            // Yes.  Add the last line.
            _lines.Enqueue(_encoding.GetString(_leftoverBuffer, 0, end - 1));

        #endregion

        // If we have something in the queue, return it
        return _lines.Count == 0 ? null : _lines.Dequeue();
    

    #endregion

    #region IEnumerator<string> Interface

    public IEnumerator<string> GetEnumerator()
    
        string line;
        // So long as the next line isn't null...
        while ((line = ReadLine()) != null)
            // Read and return it.
            yield return line;
    

    IEnumerator IEnumerable.GetEnumerator()
    
        throw new NotImplementedException();
    

    #endregion

【讨论】:

旧文章,但很难成为落后的读者。这个确实有效,而且速度很快,我做的一个小改动是实现为 IDisposable 以更安全地执行。【参考方案9】:

我知道这篇文章已经很老了,但由于我找不到如何使用投票最多的解决方案,我终于找到了这个: 这是我在 VB 和 C# 中以低内存成本找到的最佳答案

http://www.blakepell.com/2010-11-29-backward-file-reader-vb-csharp-source

希望,我会帮助其他人,因为我花了好几个小时才终于找到这篇文章!

[编辑]

这里是c#代码:

//*********************************************************************************************************************************
//
//             Class:  BackwardReader
//      Initial Date:  11/29/2010
//     Last Modified:  11/29/2010
//     Programmer(s):  Original C# Source - the_real_herminator
//                     http://social.msdn.microsoft.com/forums/en-US/csharpgeneral/thread/9acdde1a-03cd-4018-9f87-6e201d8f5d09
//                     VB Converstion - Blake Pell
//
//*********************************************************************************************************************************

using System.Text;
using System.IO;
public class BackwardReader

    private string path;
    private FileStream fs = null;
    public BackwardReader(string path)
    
        this.path = path;
        fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
        fs.Seek(0, SeekOrigin.End);
    
    public string Readline()
    
        byte[] line;
        byte[] text = new byte[1];
        long position = 0;
        int count;
        fs.Seek(0, SeekOrigin.Current);
        position = fs.Position;
        //do we have trailing rn?
        if (fs.Length > 1)
        
            byte[] vagnretur = new byte[2];
            fs.Seek(-2, SeekOrigin.Current);
            fs.Read(vagnretur, 0, 2);
            if (ASCIIEncoding.ASCII.GetString(vagnretur).Equals("rn"))
            
                //move it back
                fs.Seek(-2, SeekOrigin.Current);
                position = fs.Position;
            
        
        while (fs.Position > 0)
        
            text.Initialize();
            //read one char
            fs.Read(text, 0, 1);
            string asciiText = ASCIIEncoding.ASCII.GetString(text);
            //moveback to the charachter before
            fs.Seek(-2, SeekOrigin.Current);
            if (asciiText.Equals("n"))
            
                fs.Read(text, 0, 1);
                asciiText = ASCIIEncoding.ASCII.GetString(text);
                if (asciiText.Equals("r"))
                
                    fs.Seek(1, SeekOrigin.Current);
                    break;
                
            
        
        count = int.Parse((position - fs.Position).ToString());
        line = new byte[count];
        fs.Read(line, 0, count);
        fs.Seek(-count, SeekOrigin.Current);
        return ASCIIEncoding.ASCII.GetString(line);
    
    public bool SOF
    
        get
        
            return fs.Position == 0;
        
    
    public void Close()
    
        fs.Close();
    

【讨论】:

您应该在答案中包含链接中的相关部分,并添加链接仅供参考,这样即使链接发生变化,您的答案仍然会增加价值。 如果你有私有的IDisposable字段,你也应该实现IDisposable,并妥善处理这些字段。 为了使这段代码正常工作,“n”和“r”应该替换为“\n”和“\r”。不幸的是,此代码虽然在修复后有效,但即使对于较小的文件也很慢,请查看 Jon Person 的解决方案。【参考方案10】:

我想做类似的事情。 这是我的代码。此类将创建包含大文件块的临时文件。这将避免内存膨胀。用户可以指定她/他是否想要反转文件。因此它将以相反的方式返回内容。

这个类也可以用来在单个文件中写入大数据而不会膨胀内存。

请提供反馈。

        using System;
        using System.Collections.Generic;
        using System.Diagnostics;
        using System.IO;
        using System.Linq;
        using System.Text;
        using System.Threading.Tasks;

        namespace BigFileService
            
            public class BigFileDumper
            
                /// <summary>
                /// Buffer that will store the lines until it is full.
                /// Then it will dump it to temp files.
                /// </summary>
                public int CHUNK_SIZE = 1000;
                public bool ReverseIt  get; set; 
                public long TotalLineCount  get  return totalLineCount;  
                private long totalLineCount;
                private int BufferCount = 0;
                private StreamWriter Writer;
                /// <summary>
                /// List of files that would store the chunks.
                /// </summary>
                private List<string> LstTempFiles;
                private string ParentDirectory;
                private char[] trimchars =  '/', '\\';


                public BigFileDumper(string FolderPathToWrite)
                
                    this.LstTempFiles = new List<string>();
                    this.ParentDirectory = FolderPathToWrite.TrimEnd(trimchars) + "\\" + "BIG_FILE_DUMP";
                    this.totalLineCount = 0;
                    this.BufferCount = 0;
                    this.Initialize();
                

                private void Initialize()
                
                    // Delete existing directory.
                    if (Directory.Exists(this.ParentDirectory))
                    
                        Directory.Delete(this.ParentDirectory, true);
                    

                    // Create a new directory.
                    Directory.CreateDirectory(this.ParentDirectory);
                

                public void WriteLine(string line)
                
                    if (this.BufferCount == 0)
                    
                        string newFile = "DumpFile_" + LstTempFiles.Count();
                        LstTempFiles.Add(newFile);
                        Writer = new StreamWriter(this.ParentDirectory + "\\" + newFile);
                    
                    // Keep on adding in the buffer as long as size is okay.
                    if (this.BufferCount < this.CHUNK_SIZE)
                    
                        this.totalLineCount++; // main count
                        this.BufferCount++; // Chunk count.
                        Writer.WriteLine(line);
                    
                    else
                    
                        // Buffer is full, time to create a new file.
                        // Close the existing file first.
                        Writer.Close();
                        // Make buffer count 0 again.
                        this.BufferCount = 0;
                        this.WriteLine(line);
                    
                

                public void Close()
                
                    if (Writer != null)
                        Writer.Close();
                

                public string GetFullFile()
                
                    if (LstTempFiles.Count <= 0)
                    
                        Debug.Assert(false, "There are no files created.");
                        return "";
                    
                    string returnFilename = this.ParentDirectory + "\\" + "FullFile";
                    if (File.Exists(returnFilename) == false)
                    
                        // Create a consolidated file from the existing small dump files.
                        // Now this is interesting. We will open the small dump files one by one.
                        // Depending on whether the user require inverted file, we will read them in descending order & reverted, 
                        // or ascending order in normal way.

                        if (this.ReverseIt)
                            this.LstTempFiles.Reverse();

                        foreach (var fileName in LstTempFiles)
                        
                            string fullFileName = this.ParentDirectory + "\\" + fileName;
// FileLines will use small memory depending on size of CHUNK. User has control.
                            var fileLines = File.ReadAllLines(fullFileName);

                            // Time to write in the writer.
                            if (this.ReverseIt)
                                fileLines = fileLines.Reverse().ToArray();

                            // Write the lines 
                            File.AppendAllLines(returnFilename, fileLines);
                        
                    

                    return returnFilename;
                
            
        

这个服务可以如下使用——

void TestBigFileDump_File(string BIG_FILE, string FOLDER_PATH_FOR_CHUNK_FILES)
        
            // Start processing the input Big file.
            StreamReader reader = new StreamReader(BIG_FILE);
            // Create a dump file class object to handle efficient memory management.
            var bigFileDumper = new BigFileDumper(FOLDER_PATH_FOR_CHUNK_FILES);
            // Set to reverse the output file.
            bigFileDumper.ReverseIt = true;
            bigFileDumper.CHUNK_SIZE = 100; // How much at a time to keep in RAM before dumping to local file.

            while (reader.EndOfStream == false)
            
                string line = reader.ReadLine();
                bigFileDumper.WriteLine(line);
            
            bigFileDumper.Close();
            reader.Close();

            // Get back full reversed file.
            var reversedFilename = bigFileDumper.GetFullFile();
            Console.WriteLine("Check output file - " + reversedFilename);
        

【讨论】:

【参考方案11】:

万一其他人遇到这个问题,我用下面的 PowerShell 脚本解决了这个问题,只需少量工作即可轻松修改为 C# 脚本。

[System.IO.FileStream]$fileStream = [System.IO.File]::Open("C:\Name_of_very_large_file.log", [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::ReadWrite)
[System.IO.BufferedStream]$bs = New-Object System.IO.BufferedStream $fileStream;
[System.IO.StreamReader]$sr = New-Object System.IO.StreamReader $bs;


$buff = New-Object char[] 20;
$seek = $bs.Seek($fileStream.Length - 10000, [System.IO.SeekOrigin]::Begin);

while(($line = $sr.ReadLine()) -ne $null)

     $line;

这基本上从文件的最后 10,000 个字符开始读取,输出每一行。

【讨论】:

这将从最后 10,000 个字节向前读取,而不是从末尾向后读取到开头。另外,为什么不只是.Seek(-10000, [System.IO.SeekOrigin]::End);

以上是关于如何在 C# 中使用迭代器反向读取文本文件的主要内容,如果未能解决你的问题,请参考以下文章

PHP使用迭代器Iterator读取大容量文本文件

如何对迭代器做切片操作?

如何使用 C# 将 InkML 文件读取为图像或文本

将 .csv 文件从 URL 读取到 Python 3.x - _csv.Error:迭代器应返回字符串,而不是字节(您是不是以文本模式打开文件?)

写入/读取文本文件 (C#)

如何在 C# 中读取文本文件中的多行? [复制]