如何在 C# 中使用迭代器反向读取文本文件
Posted
技术标签:
【中文标题】如何在 C# 中使用迭代器反向读取文本文件【英文标题】:How to read a text file reversely with iterator in C# 【发布时间】:2010-10-01 23:03:28 【问题描述】:我需要处理一个大文件,大约 400K 行和 200 M。但有时我必须自下而上处理。我如何在这里使用迭代器(收益回报)?基本上我不喜欢将所有内容都加载到内存中。我知道在 .NET 中使用迭代器效率更高。
【问题讨论】:
另见:Get last 10 lines of very large text file > 10GB c# 一种可能性是从末尾读取足够多的内容,然后使用 String.LastIndexOf 向后搜索“\r\n”。 查看我在重复***.com/questions/398378/…的评论 【参考方案1】:除非您使用固定大小的编码(例如 ASCII),否则向后读取文本文件非常棘手。当您使用可变大小编码(例如 UTF-8)时,您将不得不在获取数据时检查您是否在字符中间。
框架中没有内置任何内容,我怀疑您必须为每个可变宽度编码进行单独的硬编码。
编辑:这已经有些测试了——但这并不是说它仍然没有一些微妙的错误。它使用来自 MiscUtil 的 StreamUtil,但我在底部只包含了必要的(新)方法。哦,它需要重构 - 正如您将看到的,有一种非常重要的方法:
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace MiscUtil.IO
/// <summary>
/// Takes an encoding (defaulting to UTF-8) and a function which produces a seekable stream
/// (or a filename for convenience) and yields lines from the end of the stream backwards.
/// Only single byte encodings, and UTF-8 and Unicode, are supported. The stream
/// returned by the function must be seekable.
/// </summary>
public sealed class ReverseLineReader : IEnumerable<string>
/// <summary>
/// Buffer size to use by default. Classes with internal access can specify
/// a different buffer size - this is useful for testing.
/// </summary>
private const int DefaultBufferSize = 4096;
/// <summary>
/// Means of creating a Stream to read from.
/// </summary>
private readonly Func<Stream> streamSource;
/// <summary>
/// Encoding to use when converting bytes to text
/// </summary>
private readonly Encoding encoding;
/// <summary>
/// Size of buffer (in bytes) to read each time we read from the
/// stream. This must be at least as big as the maximum number of
/// bytes for a single character.
/// </summary>
private readonly int bufferSize;
/// <summary>
/// Function which, when given a position within a file and a byte, states whether
/// or not the byte represents the start of a character.
/// </summary>
private Func<long,byte,bool> characterStartDetector;
/// <summary>
/// Creates a LineReader from a stream source. The delegate is only
/// called when the enumerator is fetched. UTF-8 is used to decode
/// the stream into text.
/// </summary>
/// <param name="streamSource">Data source</param>
public ReverseLineReader(Func<Stream> streamSource)
: this(streamSource, Encoding.UTF8)
/// <summary>
/// Creates a LineReader from a filename. The file is only opened
/// (or even checked for existence) when the enumerator is fetched.
/// UTF8 is used to decode the file into text.
/// </summary>
/// <param name="filename">File to read from</param>
public ReverseLineReader(string filename)
: this(filename, Encoding.UTF8)
/// <summary>
/// Creates a LineReader from a filename. The file is only opened
/// (or even checked for existence) when the enumerator is fetched.
/// </summary>
/// <param name="filename">File to read from</param>
/// <param name="encoding">Encoding to use to decode the file into text</param>
public ReverseLineReader(string filename, Encoding encoding)
: this(() => File.OpenRead(filename), encoding)
/// <summary>
/// Creates a LineReader from a stream source. The delegate is only
/// called when the enumerator is fetched.
/// </summary>
/// <param name="streamSource">Data source</param>
/// <param name="encoding">Encoding to use to decode the stream into text</param>
public ReverseLineReader(Func<Stream> streamSource, Encoding encoding)
: this(streamSource, encoding, DefaultBufferSize)
internal ReverseLineReader(Func<Stream> streamSource, Encoding encoding, int bufferSize)
this.streamSource = streamSource;
this.encoding = encoding;
this.bufferSize = bufferSize;
if (encoding.IsSingleByte)
// For a single byte encoding, every byte is the start (and end) of a character
characterStartDetector = (pos, data) => true;
else if (encoding is UnicodeEncoding)
// For UTF-16, even-numbered positions are the start of a character.
// TODO: This assumes no surrogate pairs. More work required
// to handle that.
characterStartDetector = (pos, data) => (pos & 1) == 0;
else if (encoding is UTF8Encoding)
// For UTF-8, bytes with the top bit clear or the second bit set are the start of a character
// See http://www.cl.cam.ac.uk/~mgk25/unicode.html
characterStartDetector = (pos, data) => (data & 0x80) == 0 || (data & 0x40) != 0;
else
throw new ArgumentException("Only single byte, UTF-8 and Unicode encodings are permitted");
/// <summary>
/// Returns the enumerator reading strings backwards. If this method discovers that
/// the returned stream is either unreadable or unseekable, a NotSupportedException is thrown.
/// </summary>
public IEnumerator<string> GetEnumerator()
Stream stream = streamSource();
if (!stream.CanSeek)
stream.Dispose();
throw new NotSupportedException("Unable to seek within stream");
if (!stream.CanRead)
stream.Dispose();
throw new NotSupportedException("Unable to read within stream");
return GetEnumeratorImpl(stream);
private IEnumerator<string> GetEnumeratorImpl(Stream stream)
try
long position = stream.Length;
if (encoding is UnicodeEncoding && (position & 1) != 0)
throw new InvalidDataException("UTF-16 encoding provided, but stream has odd length.");
// Allow up to two bytes for data from the start of the previous
// read which didn't quite make it as full characters
byte[] buffer = new byte[bufferSize + 2];
char[] charBuffer = new char[encoding.GetMaxCharCount(buffer.Length)];
int leftOverData = 0;
String previousEnd = null;
// TextReader doesn't return an empty string if there's line break at the end
// of the data. Therefore we don't return an empty string if it's our *first*
// return.
bool firstYield = true;
// A line-feed at the start of the previous buffer means we need to swallow
// the carriage-return at the end of this buffer - hence this needs declaring
// way up here!
bool swallowCarriageReturn = false;
while (position > 0)
int bytesToRead = Math.Min(position > int.MaxValue ? bufferSize : (int)position, bufferSize);
position -= bytesToRead;
stream.Position = position;
StreamUtil.ReadExactly(stream, buffer, bytesToRead);
// If we haven't read a full buffer, but we had bytes left
// over from before, copy them to the end of the buffer
if (leftOverData > 0 && bytesToRead != bufferSize)
// Buffer.BlockCopy doesn't document its behaviour with respect
// to overlapping data: we *might* just have read 7 bytes instead of
// 8, and have two bytes to copy...
Array.Copy(buffer, bufferSize, buffer, bytesToRead, leftOverData);
// We've now *effectively* read this much data.
bytesToRead += leftOverData;
int firstCharPosition = 0;
while (!characterStartDetector(position + firstCharPosition, buffer[firstCharPosition]))
firstCharPosition++;
// Bad UTF-8 sequences could trigger this. For UTF-8 we should always
// see a valid character start in every 3 bytes, and if this is the start of the file
// so we've done a short read, we should have the character start
// somewhere in the usable buffer.
if (firstCharPosition == 3 || firstCharPosition == bytesToRead)
throw new InvalidDataException("Invalid UTF-8 data");
leftOverData = firstCharPosition;
int charsRead = encoding.GetChars(buffer, firstCharPosition, bytesToRead - firstCharPosition, charBuffer, 0);
int endExclusive = charsRead;
for (int i = charsRead - 1; i >= 0; i--)
char lookingAt = charBuffer[i];
if (swallowCarriageReturn)
swallowCarriageReturn = false;
if (lookingAt == '\r')
endExclusive--;
continue;
// Anything non-line-breaking, just keep looking backwards
if (lookingAt != '\n' && lookingAt != '\r')
continue;
// End of CRLF? Swallow the preceding CR
if (lookingAt == '\n')
swallowCarriageReturn = true;
int start = i + 1;
string bufferContents = new string(charBuffer, start, endExclusive - start);
endExclusive = i;
string stringToYield = previousEnd == null ? bufferContents : bufferContents + previousEnd;
if (!firstYield || stringToYield.Length != 0)
yield return stringToYield;
firstYield = false;
previousEnd = null;
previousEnd = endExclusive == 0 ? null : (new string(charBuffer, 0, endExclusive) + previousEnd);
// If we didn't decode the start of the array, put it at the end for next time
if (leftOverData != 0)
Buffer.BlockCopy(buffer, 0, buffer, bufferSize, leftOverData);
if (leftOverData != 0)
// At the start of the final buffer, we had the end of another character.
throw new InvalidDataException("Invalid UTF-8 data at start of stream");
if (firstYield && string.IsNullOrEmpty(previousEnd))
yield break;
yield return previousEnd ?? "";
finally
stream.Dispose();
IEnumerator IEnumerable.GetEnumerator()
return GetEnumerator();
// StreamUtil.cs:
public static class StreamUtil
public static void ReadExactly(Stream input, byte[] buffer, int bytesToRead)
int index = 0;
while (index < bytesToRead)
int read = input.Read(buffer, index, bytesToRead - index);
if (read == 0)
throw new EndOfStreamException
(String.Format("End of stream reached with 0 byte1 left to read.",
bytesToRead - index,
bytesToRead - index == 1 ? "s" : ""));
index += read;
非常欢迎反馈。这很有趣:)
【讨论】:
哇!我知道它已经有三年多的历史了,但是这段代码太棒了!谢谢!! (附注:我刚刚将 File.OpenRead(filename) 更改为 File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite) 以让迭代器读取已打开的文件 @GrimaceofDespair:更多“因为那样我就不得不为继承进行设计,这在设计时间和未来的灵活性方面都增加了非常可观的成本”。通常甚至不清楚如何将继承合理地用于类型 - 最好在找到明确性之前禁止它,IMO。 @rahularyansharma:而我喜欢将问题分解为正交方面。一旦你弄清楚如何在你的情况下打开文件,我希望我的代码适合你。 如果有人希望能够与另一个进程共享文件,例如当你想读取一个由父进程打开以供写入的日志文件时,只需替换: File.OpenRead(filename) 为: new FileStream(filename, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite) @SpiritBob:恐怕这个问题太宽泛了,真的。我会首先尝试了解您感兴趣的所有编码,以及您要解决的问题。【参考方案2】:注意:这种方法不起作用(在 EDIT 中解释)
您可以使用 File.ReadLines 来获取行迭代器
foreach (var line in File.ReadLines(@"C:\temp\ReverseRead.txt").Reverse())
if (noNeedToReadFurther)
break;
// process line here
Console.WriteLine(line);
编辑:
阅读applejacks01 的评论后,我运行了一些测试,它确实看起来像 .Reverse()
实际上加载了整个文件。
我使用File.ReadLines()
打印 40MB 文件的第一行 - 控制台应用程序的内存使用量为 5MB。然后,使用File.ReadLines().Reverse()
打印同一文件的最后一行 - 内存使用量为 95MB。
结论
无论 `Reverse()' 做什么,它都不是一个好的选择来读取一个大文件的底部。
【讨论】:
我想知道对 Reverse 的调用是否确实将整个文件加载到内存中。 Enumerable的终点不是需要先建立吗?即在内部,enumerable 完全枚举文件以创建一个临时数组,然后将其反转,然后使用 yield 关键字逐个枚举,以便以相反的顺序迭代创建一个新的 Enumerable 原来的答案是错误的,但我在这里保留编辑的答案,因为它可能会阻止其他人使用这种方法。【参考方案3】:大文件的快速解决方案:在 C# 中,使用带有 Tail 参数的 PowerShell 的 Get-Content。
using System.Management.Automation;
using (PowerShell powerShell = PowerShell.Create())
string lastLine = powerShell.AddCommand("Get-Content")
.AddParameter("Path", @"c:\a.txt")
.AddParameter("Tail", 1)
.Invoke().FirstOrDefault()?.ToString();
必需的参考:“System.Management.Automation.dll” - 可能位于“C:\Program Files (x86)\Reference Assemblies\Microsoft\WindowsPowerShell\3.0”之类的地方
使用 PowerShell 会产生少量开销,但对于大文件来说是值得的。
【讨论】:
请注意,这只会从文件中获取 last 行,而不是 iterate。【参考方案4】:要创建文件迭代器,您可以这样做:
编辑:
这是我的固定宽度反向文件阅读器的固定版本:
public static IEnumerable<string> readFile()
using (FileStream reader = new FileStream(@"c:\test.txt",FileMode.Open,FileAccess.Read))
int i=0;
StringBuilder lineBuffer = new StringBuilder();
int byteRead;
while (-i < reader.Length)
reader.Seek(--i, SeekOrigin.End);
byteRead = reader.ReadByte();
if (byteRead == 10 && lineBuffer.Length > 0)
yield return Reverse(lineBuffer.ToString());
lineBuffer.Remove(0, lineBuffer.Length);
lineBuffer.Append((char)byteRead);
yield return Reverse(lineBuffer.ToString());
reader.Close();
public static string Reverse(string str)
char[] arr = new char[str.Length];
for (int i = 0; i < str.Length; i++)
arr[i] = str[str.Length - 1 - i];
return new string(arr);
【讨论】:
现在对于 ISO-8859-1 来说已经接近正确,但对于任何其他编码都不正确。编码使这变得非常棘手:( “接近 ISO-8859-1 的正确性”是什么意思?还缺少什么? 处理不太正确,无法匹配 "\r" "\n" 和 "\r\n",后者最终只计为一个换行符。 它也永远不会产生空行——“a\n\nb”应该产生“a”、“”、“b” mmmmmm...我只在找到'\n'(ASCII 10)时才产生lineBuffer。你说得对,我不考虑账户'\r'。【参考方案5】:我将文件逐行放入列表中,然后使用 List.Reverse();
StreamReader objReader = new StreamReader(filename);
string sLine = "";
ArrayList arrText = new ArrayList();
while (sLine != null)
sLine = objReader.ReadLine();
if (sLine != null)
arrText.Add(sLine);
objReader.Close();
arrText.Reverse();
foreach (string sOutput in arrText)
...
【讨论】:
不是大文件的最佳解决方案,因为您需要将其完全加载到 RAM 中。并且 OP 明确指出他不想完全加载它。【参考方案6】:我还添加了我的解决方案。在阅读了一些答案后,没有什么真正适合我的情况。 我从后面逐字节读取,直到找到 LineFeed,然后我将收集的字节作为字符串返回,不使用缓冲。
用法:
var reader = new ReverseTextReader(path);
while (!reader.EndOfStream)
Console.WriteLine(reader.ReadLine());
实施:
public class ReverseTextReader
private const int LineFeedLf = 10;
private const int LineFeedCr = 13;
private readonly Stream _stream;
private readonly Encoding _encoding;
public bool EndOfStream => _stream.Position == 0;
public ReverseTextReader(Stream stream, Encoding encoding)
_stream = stream;
_encoding = encoding;
_stream.Position = _stream.Length;
public string ReadLine()
if (_stream.Position == 0) return null;
var line = new List<byte>();
var endOfLine = false;
while (!endOfLine)
var b = _stream.ReadByteFromBehind();
if (b == -1 || b == LineFeedLf)
endOfLine = true;
line.Add(Convert.ToByte(b));
line.Reverse();
return _encoding.GetString(line.ToArray());
public static class StreamExtensions
public static int ReadByteFromBehind(this Stream stream)
if (stream.Position == 0) return -1;
stream.Position = stream.Position - 1;
var value = stream.ReadByte();
stream.Position = stream.Position - 1;
return value;
【讨论】:
【参考方案7】:您可以一次向后读取文件一个字符并缓存所有字符,直到到达回车和/或换行。
然后,您将收集的字符串反转并将其作为一条线大喊。
【讨论】:
倒着读一个字符的文件是很困难的——因为你必须能够识别一个字符的开头。这有多简单取决于编码。【参考方案8】:这里已经有很好的答案,这里有另一个您可以使用的 LINQ 兼容类,它侧重于性能和对大文件的支持。它假定一个“\r\n”行终止符。
用法:
var reader = new ReverseTextReader(@"C:\Temp\ReverseTest.txt");
while (!reader.EndOfStream)
Console.WriteLine(reader.ReadLine());
ReverseTextReader 类:
/// <summary>
/// Reads a text file backwards, line-by-line.
/// </summary>
/// <remarks>This class uses file seeking to read a text file of any size in reverse order. This
/// is useful for needs such as reading a log file newest-entries first.</remarks>
public sealed class ReverseTextReader : IEnumerable<string>
private const int BufferSize = 16384; // The number of bytes read from the uderlying stream.
private readonly Stream _stream; // Stores the stream feeding data into this reader
private readonly Encoding _encoding; // Stores the encoding used to process the file
private byte[] _leftoverBuffer; // Stores the leftover partial line after processing a buffer
private readonly Queue<string> _lines; // Stores the lines parsed from the buffer
#region Constructors
/// <summary>
/// Creates a reader for the specified file.
/// </summary>
/// <param name="filePath"></param>
public ReverseTextReader(string filePath)
: this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), Encoding.Default)
/// <summary>
/// Creates a reader using the specified stream.
/// </summary>
/// <param name="stream"></param>
public ReverseTextReader(Stream stream)
: this(stream, Encoding.Default)
/// <summary>
/// Creates a reader using the specified path and encoding.
/// </summary>
/// <param name="filePath"></param>
/// <param name="encoding"></param>
public ReverseTextReader(string filePath, Encoding encoding)
: this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), encoding)
/// <summary>
/// Creates a reader using the specified stream and encoding.
/// </summary>
/// <param name="stream"></param>
/// <param name="encoding"></param>
public ReverseTextReader(Stream stream, Encoding encoding)
_stream = stream;
_encoding = encoding;
_lines = new Queue<string>(128);
// The stream needs to support seeking for this to work
if(!_stream.CanSeek)
throw new InvalidOperationException("The specified stream needs to support seeking to be read backwards.");
if (!_stream.CanRead)
throw new InvalidOperationException("The specified stream needs to support reading to be read backwards.");
// Set the current position to the end of the file
_stream.Position = _stream.Length;
_leftoverBuffer = new byte[0];
#endregion
#region Overrides
/// <summary>
/// Reads the next previous line from the underlying stream.
/// </summary>
/// <returns></returns>
public string ReadLine()
// Are there lines left to read? If so, return the next one
if (_lines.Count != 0) return _lines.Dequeue();
// Are we at the beginning of the stream? If so, we're done
if (_stream.Position == 0) return null;
#region Read and Process the Next Chunk
// Remember the current position
var currentPosition = _stream.Position;
var newPosition = currentPosition - BufferSize;
// Are we before the beginning of the stream?
if (newPosition < 0) newPosition = 0;
// Calculate the buffer size to read
var count = (int)(currentPosition - newPosition);
// Set the new position
_stream.Position = newPosition;
// Make a new buffer but append the previous leftovers
var buffer = new byte[count + _leftoverBuffer.Length];
// Read the next buffer
_stream.Read(buffer, 0, count);
// Move the position of the stream back
_stream.Position = newPosition;
// And copy in the leftovers from the last buffer
if (_leftoverBuffer.Length != 0)
Array.Copy(_leftoverBuffer, 0, buffer, count, _leftoverBuffer.Length);
// Look for CrLf delimiters
var end = buffer.Length - 1;
var start = buffer.Length - 2;
// Search backwards for a line feed
while (start >= 0)
// Is it a line feed?
if (buffer[start] == 10)
// Yes. Extract a line and queue it (but exclude the \r\n)
_lines.Enqueue(_encoding.GetString(buffer, start + 1, end - start - 2));
// And reset the end
end = start;
// Move to the previous character
start--;
// What's left over is a portion of a line. Save it for later.
_leftoverBuffer = new byte[end + 1];
Array.Copy(buffer, 0, _leftoverBuffer, 0, end + 1);
// Are we at the beginning of the stream?
if (_stream.Position == 0)
// Yes. Add the last line.
_lines.Enqueue(_encoding.GetString(_leftoverBuffer, 0, end - 1));
#endregion
// If we have something in the queue, return it
return _lines.Count == 0 ? null : _lines.Dequeue();
#endregion
#region IEnumerator<string> Interface
public IEnumerator<string> GetEnumerator()
string line;
// So long as the next line isn't null...
while ((line = ReadLine()) != null)
// Read and return it.
yield return line;
IEnumerator IEnumerable.GetEnumerator()
throw new NotImplementedException();
#endregion
【讨论】:
旧文章,但很难成为落后的读者。这个确实有效,而且速度很快,我做的一个小改动是实现为 IDisposable 以更安全地执行。【参考方案9】:我知道这篇文章已经很老了,但由于我找不到如何使用投票最多的解决方案,我终于找到了这个: 这是我在 VB 和 C# 中以低内存成本找到的最佳答案
http://www.blakepell.com/2010-11-29-backward-file-reader-vb-csharp-source
希望,我会帮助其他人,因为我花了好几个小时才终于找到这篇文章!
[编辑]
这里是c#代码:
//*********************************************************************************************************************************
//
// Class: BackwardReader
// Initial Date: 11/29/2010
// Last Modified: 11/29/2010
// Programmer(s): Original C# Source - the_real_herminator
// http://social.msdn.microsoft.com/forums/en-US/csharpgeneral/thread/9acdde1a-03cd-4018-9f87-6e201d8f5d09
// VB Converstion - Blake Pell
//
//*********************************************************************************************************************************
using System.Text;
using System.IO;
public class BackwardReader
private string path;
private FileStream fs = null;
public BackwardReader(string path)
this.path = path;
fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
fs.Seek(0, SeekOrigin.End);
public string Readline()
byte[] line;
byte[] text = new byte[1];
long position = 0;
int count;
fs.Seek(0, SeekOrigin.Current);
position = fs.Position;
//do we have trailing rn?
if (fs.Length > 1)
byte[] vagnretur = new byte[2];
fs.Seek(-2, SeekOrigin.Current);
fs.Read(vagnretur, 0, 2);
if (ASCIIEncoding.ASCII.GetString(vagnretur).Equals("rn"))
//move it back
fs.Seek(-2, SeekOrigin.Current);
position = fs.Position;
while (fs.Position > 0)
text.Initialize();
//read one char
fs.Read(text, 0, 1);
string asciiText = ASCIIEncoding.ASCII.GetString(text);
//moveback to the charachter before
fs.Seek(-2, SeekOrigin.Current);
if (asciiText.Equals("n"))
fs.Read(text, 0, 1);
asciiText = ASCIIEncoding.ASCII.GetString(text);
if (asciiText.Equals("r"))
fs.Seek(1, SeekOrigin.Current);
break;
count = int.Parse((position - fs.Position).ToString());
line = new byte[count];
fs.Read(line, 0, count);
fs.Seek(-count, SeekOrigin.Current);
return ASCIIEncoding.ASCII.GetString(line);
public bool SOF
get
return fs.Position == 0;
public void Close()
fs.Close();
【讨论】:
您应该在答案中包含链接中的相关部分,并添加链接仅供参考,这样即使链接发生变化,您的答案仍然会增加价值。 如果你有私有的IDisposable
字段,你也应该实现IDisposable
,并妥善处理这些字段。
为了使这段代码正常工作,“n”和“r”应该替换为“\n”和“\r”。不幸的是,此代码虽然在修复后有效,但即使对于较小的文件也很慢,请查看 Jon Person 的解决方案。【参考方案10】:
我想做类似的事情。 这是我的代码。此类将创建包含大文件块的临时文件。这将避免内存膨胀。用户可以指定她/他是否想要反转文件。因此它将以相反的方式返回内容。
这个类也可以用来在单个文件中写入大数据而不会膨胀内存。
请提供反馈。
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace BigFileService
public class BigFileDumper
/// <summary>
/// Buffer that will store the lines until it is full.
/// Then it will dump it to temp files.
/// </summary>
public int CHUNK_SIZE = 1000;
public bool ReverseIt get; set;
public long TotalLineCount get return totalLineCount;
private long totalLineCount;
private int BufferCount = 0;
private StreamWriter Writer;
/// <summary>
/// List of files that would store the chunks.
/// </summary>
private List<string> LstTempFiles;
private string ParentDirectory;
private char[] trimchars = '/', '\\';
public BigFileDumper(string FolderPathToWrite)
this.LstTempFiles = new List<string>();
this.ParentDirectory = FolderPathToWrite.TrimEnd(trimchars) + "\\" + "BIG_FILE_DUMP";
this.totalLineCount = 0;
this.BufferCount = 0;
this.Initialize();
private void Initialize()
// Delete existing directory.
if (Directory.Exists(this.ParentDirectory))
Directory.Delete(this.ParentDirectory, true);
// Create a new directory.
Directory.CreateDirectory(this.ParentDirectory);
public void WriteLine(string line)
if (this.BufferCount == 0)
string newFile = "DumpFile_" + LstTempFiles.Count();
LstTempFiles.Add(newFile);
Writer = new StreamWriter(this.ParentDirectory + "\\" + newFile);
// Keep on adding in the buffer as long as size is okay.
if (this.BufferCount < this.CHUNK_SIZE)
this.totalLineCount++; // main count
this.BufferCount++; // Chunk count.
Writer.WriteLine(line);
else
// Buffer is full, time to create a new file.
// Close the existing file first.
Writer.Close();
// Make buffer count 0 again.
this.BufferCount = 0;
this.WriteLine(line);
public void Close()
if (Writer != null)
Writer.Close();
public string GetFullFile()
if (LstTempFiles.Count <= 0)
Debug.Assert(false, "There are no files created.");
return "";
string returnFilename = this.ParentDirectory + "\\" + "FullFile";
if (File.Exists(returnFilename) == false)
// Create a consolidated file from the existing small dump files.
// Now this is interesting. We will open the small dump files one by one.
// Depending on whether the user require inverted file, we will read them in descending order & reverted,
// or ascending order in normal way.
if (this.ReverseIt)
this.LstTempFiles.Reverse();
foreach (var fileName in LstTempFiles)
string fullFileName = this.ParentDirectory + "\\" + fileName;
// FileLines will use small memory depending on size of CHUNK. User has control.
var fileLines = File.ReadAllLines(fullFileName);
// Time to write in the writer.
if (this.ReverseIt)
fileLines = fileLines.Reverse().ToArray();
// Write the lines
File.AppendAllLines(returnFilename, fileLines);
return returnFilename;
这个服务可以如下使用——
void TestBigFileDump_File(string BIG_FILE, string FOLDER_PATH_FOR_CHUNK_FILES)
// Start processing the input Big file.
StreamReader reader = new StreamReader(BIG_FILE);
// Create a dump file class object to handle efficient memory management.
var bigFileDumper = new BigFileDumper(FOLDER_PATH_FOR_CHUNK_FILES);
// Set to reverse the output file.
bigFileDumper.ReverseIt = true;
bigFileDumper.CHUNK_SIZE = 100; // How much at a time to keep in RAM before dumping to local file.
while (reader.EndOfStream == false)
string line = reader.ReadLine();
bigFileDumper.WriteLine(line);
bigFileDumper.Close();
reader.Close();
// Get back full reversed file.
var reversedFilename = bigFileDumper.GetFullFile();
Console.WriteLine("Check output file - " + reversedFilename);
【讨论】:
【参考方案11】:万一其他人遇到这个问题,我用下面的 PowerShell 脚本解决了这个问题,只需少量工作即可轻松修改为 C# 脚本。
[System.IO.FileStream]$fileStream = [System.IO.File]::Open("C:\Name_of_very_large_file.log", [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::ReadWrite)
[System.IO.BufferedStream]$bs = New-Object System.IO.BufferedStream $fileStream;
[System.IO.StreamReader]$sr = New-Object System.IO.StreamReader $bs;
$buff = New-Object char[] 20;
$seek = $bs.Seek($fileStream.Length - 10000, [System.IO.SeekOrigin]::Begin);
while(($line = $sr.ReadLine()) -ne $null)
$line;
这基本上从文件的最后 10,000 个字符开始读取,输出每一行。
【讨论】:
这将从最后 10,000 个字节向前读取,而不是从末尾向后读取到开头。另外,为什么不只是.Seek(-10000, [System.IO.SeekOrigin]::End);
?以上是关于如何在 C# 中使用迭代器反向读取文本文件的主要内容,如果未能解决你的问题,请参考以下文章
将 .csv 文件从 URL 读取到 Python 3.x - _csv.Error:迭代器应返回字符串,而不是字节(您是不是以文本模式打开文件?)