java 读取一个巨大的文本文件，该如何实现既能保证内存不溢出又能保证性能？

Posted 2023-03-25

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了java 读取一个巨大的文本文件，该如何实现既能保证内存不溢出又能保证性能？相关的知识，希望对你有一定的参考价值。

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

public class ReadBig
public static String fff = "C:\\\\mq\\\\read\\\\from.xml";

public static void main1(String[] args) throws Exception

final int BUFFER_SIZE = 0x300000;// 缓冲区大小为3M

File f = new File(fff);

/**
*
* map(FileChannel.MapMode mode,long position, long size)
*
* mode - 根据是按只读、读取/写入或专用（写入时拷贝）来映射文件，分别为 FileChannel.MapMode 类中所定义的
* READ_ONLY、READ_WRITE 或 PRIVATE 之一
*
* position - 文件中的位置，映射区域从此位置开始；必须为非负数
*
* size - 要映射的区域大小；必须为非负数且不大于 Integer.MAX_VALUE
*
* 所以若想读取文件后半部分内容，如例子所写；若想读取文本后1/8内容，需要这样写map(FileChannel.MapMode.READ_ONLY,
* f.length()*7/8,f.length()/8)
*
* 想读取文件所有内容，需要这样写map(FileChannel.MapMode.READ_ONLY, 0,f.length())
*
*/

MappedByteBuffer inputBuffer = new RandomAccessFile(f, "r")
.getChannel().map(FileChannel.MapMode.READ_ONLY,
f.length() / 2, f.length() / 2);

byte[] dst = new byte[BUFFER_SIZE];// 每次读出3M的内容

long start = System.currentTimeMillis();

for (int offset = 0; offset < inputBuffer.capacity(); offset += BUFFER_SIZE)

if (inputBuffer.capacity() - offset >= BUFFER_SIZE)

for (int i = 0; i < BUFFER_SIZE; i++)

dst[i] = inputBuffer.get(offset + i);

else

for (int i = 0; i < inputBuffer.capacity() - offset; i++)

dst[i] = inputBuffer.get(offset + i);

int length = (inputBuffer.capacity() % BUFFER_SIZE == 0) ? BUFFER_SIZE
: inputBuffer.capacity() % BUFFER_SIZE;

System.out.println(new String(dst, 0, length));// new
// String(dst,0,length)这样可以取出缓存保存的字符串，可以对其进行操作

long end = System.currentTimeMillis();

System.out.println("读取文件文件一半内容花费：" + (end - start) + "毫秒");

public static void main2(String[] args) throws Exception
int bufSize = 1024;
byte[] bs = new byte[bufSize];
ByteBuffer byteBuf = ByteBuffer.allocate(1024);
FileChannel channel = new RandomAccessFile(fff, "r").getChannel();
while (channel.read(byteBuf) != -1)
int size = byteBuf.position();
byteBuf.rewind();
byteBuf.get(bs); // 把文件当字符串处理，直接打印做为一个例子。
System.out.print(new String(bs, 0, size));
byteBuf.clear();

public static void main(String[] args) throws Exception
BufferedReader br = new BufferedReader(new FileReader(fff));
String line = null;
while ((line = br.readLine()) != null)
System.out.println(line);

参考技术A 对于大文件的处理，由于有时会出现文件大小远远大于内存，所以不可能把文件读到内存中来处理，楼主可以考虑分段、或者是分文件来处理，把文件给分割成一些小文件，再分别读到内存中去处理。参考技术B 2楼的很对只能一部分一部分的读我依稀记得我们处理也只是批处理像oracle自增长列也是一部分读但是听说有很好的方法不知道参考技术C 呵呵巨大是多大呢？

java读取大文件超大文件的几种方法

java 读取一个巨大的文本文件既能保证内存不溢出又能保证性能

import java.io.BufferedReader;

import java.io.File;

import java.io.FileReader;

import java.io.RandomAccessFile;

import java.nio.ByteBuffer;

import java.nio.MappedByteBuffer;

import java.nio.channels.FileChannel;

public class ReadBig {

public static String fff = "C:\\mq\\read\\from.xml";

public static void main1(String[] args) throws Exception {

final int BUFFER_SIZE = 0x300000;// 缓冲区大小为3M

File f = new File(fff);

/**

* map(FileChannel.MapMode mode,long position, long size)

* mode - 根据是按只读、读取/写入或专用（写入时拷贝）来映射文件，分别为 FileChannel.MapMode 类中所定义的

* READ_ONLY、READ_WRITE 或 PRIVATE 之一

* position - 文件中的位置，映射区域从此位置开始；必须为非负数

* size - 要映射的区域大小；必须为非负数且不大于 Integer.MAX_VALUE

* 所以若想读取文件后半部分内容，如例子所写；若想读取文本后1/8内容，需要这样写map(FileChannel.MapMode.READ_ONLY,

* f.length()*7/8,f.length()/8)

* 想读取文件所有内容，需要这样写map(FileChannel.MapMode.READ_ONLY, 0,f.length())

MappedByteBuffer inputBuffer = new RandomAccessFile(f, "r")

.getChannel().map(FileChannel.MapMode.READ_ONLY,

f.length() / 2, f.length() / 2);

byte[] dst = new byte[BUFFER_SIZE];// 每次读出3M的内容

long start = System.currentTimeMillis();

for (int offset = 0; offset < inputBuffer.capacity(); offset += BUFFER_SIZE) {

if (inputBuffer.capacity() - offset >= BUFFER_SIZE) {

for (int i = 0; i < BUFFER_SIZE; i++)

dst[i] = inputBuffer.get(offset + i);

} else {

for (int i = 0; i < inputBuffer.capacity() - offset; i++)

dst[i] = inputBuffer.get(offset + i);

}

int length = (inputBuffer.capacity() % BUFFER_SIZE == 0) ? BUFFER_SIZE

: inputBuffer.capacity() % BUFFER_SIZE;

System.out.println(new String(dst, 0, length));// new

// String(dst,0,length)这样可以取出缓存保存的字符串，可以对其进行操作

}

long end = System.currentTimeMillis();

System.out.println("读取文件文件一半内容花费：" + (end - start) + "毫秒");

}

public static void main2(String[] args) throws Exception {

int bufSize = 1024;

byte[] bs = new byte[bufSize];

ByteBuffer byteBuf = ByteBuffer.allocate(1024);

FileChannel channel = new RandomAccessFile(fff, "r").getChannel();

while (channel.read(byteBuf) != -1) {

int size = byteBuf.position();

byteBuf.rewind();

byteBuf.get(bs); // 把文件当字符串处理，直接打印做为一个例子。

System.out.print(new String(bs, 0, size));

byteBuf.clear();

}

public static void main(String[] args) throws Exception {

BufferedReader br = new BufferedReader(new FileReader(fff));

String line = null;

while ((line = br.readLine()) != null) {

System.out.println(line);

}

public static void main(String[] args) throws Exception {

int bufSize = 1024;

byte[] bs = new byte[bufSize];

ByteBuffer byteBuf = ByteBuffer.allocate(1024);

FileChannel channel = new RandomAccessFile("d:\\filename","r").getChannel();

while(channel.read(byteBuf) != -1) {

int size = byteBuf.position();

byteBuf.rewind();

byteBuf.get(bs);

// 把文件当字符串处理，直接打印做为一个例子。

System.out.print(new String(bs, 0, size));

byteBuf.clear();

}

java 读取大容量文件，内存溢出？怎么按几行读取，读取多次

import java.io.BufferedReader;

import java.io.FileNotFoundException;

import java.io.FileReader;

import java.io.IOException;

import java.io.RandomAccessFile;

import java.util.Scanner;

public class TestPrint {

public static void main(String[] args) throws IOException {

String path = "你要读的文件的路径";

RandomAccessFile br=new RandomAccessFile(path,"rw");//这里rw看你了。要是之都就只写r

String str = null, app = null;

int i=0;

while ((str = br.readLine()) != null) {

i++;

app=app+str;

if(i>=100){//假设读取100行

i=0;

// 这里你先对这100行操作，然后继续读

app=null;

}

br.close();

}

当逐行读写大于2G的文本文件时推荐使用以下代码

void largeFileIO(String inputFile, String outputFile) {

try {

BufferedInputStream bis = new BufferedInputStream(new FileInputStream(new File(inputFile)));

BufferedReader in = new BufferedReader(new InputStreamReader(bis, "utf-8"), 10 * 1024 * 1024);//10M缓存

FileWriter fw = new FileWriter(outputFile);

while (in.ready()) {

String line = in.readLine();

fw.append(line + " ");

}

in.close();

fw.flush();

fw.close();

} catch (IOException ex) {

ex.printStackTrace();

}

jdk本身就支持超大文件的读写。

网上的文章基本分为两大类，一类是使用BufferedReader类读写超大文件；另一类是使用RandomAccessFile类读取，经过比较，最后使用了前一种方式进行超大文件的读取，下面是相关代码，其实很简单

-------------------------------------------------------------------

File file = new File(filepath);

BufferedInputStream fis = new BufferedInputStream(new FileInputStream(file));

BufferedReader reader = new BufferedReader(new InputStreamReader(fis,"utf-8"),5*1024*1024);// 用5M的缓冲读取文本文件

String line = "";

while((line = reader.readLine()) != null){

//TODO: write your business

}

---------------------------------------------------------------------

注意代码，在实例化BufferedReader时，增加一个分配缓存的参数即可

以上是关于java 读取一个巨大的文本文件，该如何实现既能保证内存不溢出又能保证性能？的主要内容，如果未能解决你的问题，请参考以下文章

java 读取一个巨大的文本文件，该如何实现 既能保证内存不溢出 又能保证性能 ？

java读取大文件 超大文件的几种方法

java 读取一个巨大的文本文件，该如何实现既能保证内存不溢出又能保证性能？

java读取大文件超大文件的几种方法