How to remove duplicate lines in a large text file?

Posted lightwindy

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了How to remove duplicate lines in a large text file?相关的知识,希望对你有一定的参考价值。

How would you remove duplicate lines from a file that is  much too large to fit in memory? The duplicate lines are not necessarily adjacent, and say the file is 10 times bigger than RAM.

A better solution is to use HashSet to store each line of input.txt. As set ignores duplicate values, so while storing a line, check if it already present in hashset. Write it to output.txt only if not present in hashset.

Java:

// Efficient Java program to remove 
// duplicates from input.txt and  
// save output to output.txt 
  
import java.io.*; 
import java.util.HashSet; 
  
public class FileOperation 
{ 
    public static void main(String[] args) throws IOException  
    { 
        // PrintWriter object for output.txt 
        PrintWriter pw = new PrintWriter("output.txt"); 
          
        // BufferedReader object for input.txt 
        BufferedReader br = new BufferedReader(new FileReader("input.txt")); 
          
        String line = br.readLine(); 
          
        // set store unique values 
        HashSet<String> hs = new HashSet<String>(); 
          
        // loop for each line of input.txt 
        while(line != null) 
        { 
            // write only if not 
            // present in hashset 
            if(hs.add(line)) 
                pw.println(line); 
              
            line = br.readLine(); 
              
        } 
          
        pw.flush(); 
          
        // closing resources 
        br.close(); 
        pw.close(); 
          
        System.out.println("File operation performed successfully"); 
    } 
} 

  

 

 

 

 

以上是关于How to remove duplicate lines in a large text file?的主要内容,如果未能解决你的问题,请参考以下文章

How to remove popup on boot on Windows 2003

How to remove ROM cfg in MAME

How to remove administrative shares in Windows Server 2008 or 2012

How to remove administrative shares in Windows Server 2008 or 2012

How to remove administrative shares in Windows Server 2008 or 2012

How to remove the "Active" mark on partitions