使用hashmap或hashset比较大的csv文件
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用hashmap或hashset比较大的csv文件相关的知识,希望对你有一定的参考价值。
我正在尝试比较两个巨大的CSV文件。第一个文件(id.csv)由用户ID和第二个文件(data.csv)组成,由原始数据组成。我正在尝试迭代第一个文件中的每个id,并从第二个文件中找到相同id的所有原始数据并写入新文件。我已经尝试了我的简单代码如下,但我认为这将需要超过一个月的时间来处理。请帮助实现可以更快处理的代码。
public class FilterUser {
public static String UniqueUser = "D:/test/id.csv";
public static String Raw = "D:/test/data.csv";
public static String OutputFile = "D:/test/output.csv";
public static void main(String[] args) throws IOException
{
Scanner ScanIn1 = null;
String users = "";
String[] record;
ArrayList<String> InArray = new ArrayList<>();
String line;
long startTime = System.currentTimeMillis();
try{
ScanIn1 = new Scanner(new BufferedReader(new FileReader(UniqueUser)));
BufferedReader br = new BufferedReader(new FileReader(Raw));
BufferedWriter bw = new BufferedWriter(new FileWriter(OutputFile));
bw.write("id,date,time,Use_duration,book1,book2");
bw.newLine();
while(ScanIn1.hasNext()){
users = ScanIn1.nextLine();
InArray.add(users);
}
while((line = br.readLine()) != null){
record = line.split(",");
for(int i=0; i<InArray.size(); i++){
if(InArray.get(i).equals(record[0])){
String output = record[0] + "," + record[1] + "," + record[2] + "," + record[3] + "," + record[4]+ "," + record[5];
bw.write(output);
bw.newLine();
}
}
}
br.close();
bw.close();
ScanIn1.close();
}
catch (FileNotFoundException ex){
System.out.println(ex);
}
catch (IOException ex){
System.out.println(ex);
}
long endTime = System.currentTimeMillis();
long TotalTime = endTime - startTime;
System.out.println("Total time =" + TotalTime);
}
}
id.csv
data.csv
答案
你的代码可以用hashSet重写rrapcing arraylist。因为使用hashSet的contains()方法,代码变得高效。 contains()方法的效率为O(1)。因此,您可以避免使用2个循环(while和for)进行验证。
public class FilterUser {
public static String UniqueUser = "D:/test/id.csv";
public static String Raw = "D:/test/data.csv";
public static String OutputFile = "D:/test/output.csv";
public static void main(String[] args) throws IOException
{
Scanner ScanIn1 = null;
String users = "";
String id = "";
HashSet<String> InArray = new HashSet<String>();
String line;
long startTime = System.currentTimeMillis();
try{
ScanIn1 = new Scanner(new BufferedReader(new FileReader(UniqueUser)));
BufferedReader br = new BufferedReader(new FileReader(Raw));
BufferedWriter bw = new BufferedWriter(new FileWriter(OutputFile));
bw.write("id,date,time,Use_duration,book1,book2");
bw.newLine();
while(ScanIn1.hasNext()){
users = ScanIn1.nextLine();
InArray.add(users);
}
while((line = br.readLine()) != null){
id=line.substring(0, 3);
if(InArray.contains(id)){
bw.write(line);
bw.newLine();
}
}
br.close();
bw.close();
ScanIn1.close();
}
catch (FileNotFoundException ex){
System.out.println(ex);
}
catch (IOException ex){
System.out.println(ex);
}
long endTime = System.currentTimeMillis();
long TotalTime = endTime - startTime;
System.out.println("Total time =" + TotalTime);
}
}
以上是关于使用hashmap或hashset比较大的csv文件的主要内容,如果未能解决你的问题,请参考以下文章