在文本文件中查找特定单词并计算它

Posted

技术标签:

【中文标题】在文本文件中查找特定单词并计算它【英文标题】:Find specific word in text file and count it 【发布时间】:2012-10-13 08:30:10 【问题描述】:

有人可以帮我写代码吗? 如何在文本文件中搜索任何单词并计算它重复了多少?

例如test.txt:

hi
hola
hey
hi
bye
hoola
hi

如果我想知道在 test.txt 中重复了多少次单词“Hi”,程序必须说“重复 3 次”

我希望你明白我想要什么,谢谢你的回答。

【问题讨论】:

你试过什么?我相信您至少可以打开文件并阅读其中的行... 看看***.com/questions/5102044/… 【参考方案1】:
public class Wordcount 

   public static void main(String[] args)
          
       int count=0;

       String str="hi this is is is line";

       String []s1=str.split(" ");

       for(int i=0;i<=s1.length-1;i++)
       
          if(s1[i].equals("is"))
           
               count++; 
           
       

       System.out.println(count);
   

【讨论】:

嗨,欢迎来到 SO,为旧问题发布新的、更新的解决方案总是好的,但请尝试使这些答案尽可能地提供信息和清晰。尝试在您的代码中添加描述并确保其格式正确。也请尽量避免无用的 cmets。 这个答案如何为 3 岁的帖子增加价值?这里还有其他类似的答案。【参考方案2】:
public int occurrencesOfHi()

    String newText = Text.replace("Hi","");
    return (Text.length() - newText.length())/2;

【讨论】:

考虑在你的答案中加入一些 cmets。【参考方案3】:
package somePackage;   
public static void main(String[] args) 

            String path = ""; //ADD YOUR PATH HERE
            String fileName = "test2.txt";
            String testWord = "Macbeth"; //CHANGE THIS IF YOU WANT
            int tLen = testWord.length();
            int wordCntr = 0;
            String file = path + fileName;
            boolean check;

            try
                FileInputStream fstream = new FileInputStream(file);
                BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
                String strLine;        
                //Read File Line By Line
                while((strLine = br.readLine()) != null)                
                    //check to see whether testWord occurs at least once in the line of text
                    check = strLine.toLowerCase().contains(testWord.toLowerCase());
                    if(check)                    
                        //get the line, and parse its words into a String array
                        String[] lineWords = strLine.split("\\s+");                    
                        for(String w : lineWords)
                            //first see if the word is as least as long as the testWord
                            if(w.length() >= tLen)
                                /*
                                1) grab the specific word, minus whitespace
                                2) check to see whether the first part of it having same length
                                    as testWord is equivalent to testWord, ignoring case
                                */
                                String word = w.substring(0,tLen).trim();                                                        
                                if(word.equalsIgnoreCase(testWord))                                
                                    wordCntr++;
                                                            
                            
                                            
                       
                            
                System.out.println("total is: " + wordCntr);
            //Close the input stream
            br.close();
             catch(Exception e)
                e.printStackTrace();
            
        

【讨论】:

我抓取了 Macbeth 的文本并将其存储在一个名为 text2.txt 的文件中【参考方案4】:
package com.test;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.Scanner;

public  class Test 

    public static void main(String[] args)  throws Exception

        BufferedReader bf= new BufferedReader(new FileReader("src/test.txt"));
        Scanner sc = new Scanner(System.in);
        String W=sc.next();
        //String regex ="[\\w"+W+"]";
        int count=0;

        //Pattern p = Pattern.compile();
        String line=bf.readLine();
        String s[];
        do
        
            s=line.split(" ");
            for(String a:s)
            
                if(a.contains(W))
                    count++;

            


            line=bf.readLine();


        while(line!=null);
        System.out.println(count);
    




【讨论】:

【参考方案5】:
public int countWord(String word, File file) 
int count = 0;
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) 
    String nextToken = scanner.next();
    if (nextToken.equalsIgnoreCase(word))
    count++;

return count;

【讨论】:

如果文件在行尾包含空格,则会出现异常。 当单词在点之后时会失败。例如。意大利是国家。意大利是个好地方。 italy 后面的点使它成为一个完整的单词“.Italy”,所以意大利的计数将给出 1【参考方案6】:
import java.io.*;
import java.util.*;

class filedemo

public static void main(String ar[])throws Exception
BufferedReader br=new BufferedReader(new FileReader("c:/file.txt"));
 System.out.println("enter the string which you search");
 Scanner ob=new Scanner(System.in);
 String str=ob.next();
 String str1="",str2="";
 int count=0;
while((str1=br.readLine())!=null)
 
 str2 +=str1;

  

 int index = str2.indexOf(str);

 while (index != -1) 
 count++;
 str2 = str2.substring(index + 1);
 index = str2.indexOf(str);


System.out.println("Number of the occures="+count);

  

【讨论】:

【参考方案7】:
package File1;

import java.io.BufferedReader;
import java.io.FileReader;

public class CountLineWordsDuplicateWords 

public static void main(String[] args) 
    FileReader fr = null;
    BufferedReader br =null;

    String [] stringArray;
    int counLine = 0;
    int arrayLength ;
    String s="";
    String stringLine="";
    try
        fr = new FileReader("F:/Line.txt");
        br = new BufferedReader(fr);
        while((s = br.readLine()) != null)
            stringLine = stringLine + s;
            stringLine = stringLine + " ";/*Add space*/
            counLine ++;
        
        System.out.println(stringLine);

        stringArray = stringLine.split(" ");
        arrayLength = stringArray.length;
                     System.out.println("The number of Words is "+arrayLength);
        /*Duplicate String count code */
        for (int i = 0; i < arrayLength; i++) 
            int c = 1 ;
            for (int j = i+1; j < arrayLength; j++) 
                if(stringArray[i].equalsIgnoreCase(stringArray[j]))
                    c++;
                    for (int j2 = j; j2 < arrayLength; j2++) 
                        stringArray[j2] = stringArray[j2+1];
                        arrayLength = arrayLength - 1;
                    

                //End of If block
            //End of Inner for block
        System.out.println("The "+stringArray[i]+" present "+c+" times .");
        //End of Outer for block
        System.out.println("The number of Line is "+counLine);
        System.out.println();
        fr.close();
        br.close();
    catch (Exception e) 
        e.printStackTrace();
    
//End of main() method 
//End of class CountLineWordsDuplicateWords

【讨论】:

【参考方案8】:

PatternMatcher 试试这个方法。

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Dem 

    public static void main(String[] args)

        try 
            File f = new File("d://My.txt");
            FileReader fr = new FileReader(f);
            BufferedReader br = new BufferedReader(fr);
            String s = new String();

            while((s=br.readLine())!=null)

                s = s + s;

            

            int count = 0;
            Pattern pat = Pattern.compile("it*");
            Matcher mat = pat.matcher(s);

            while(mat.find())

                  if(mat.find())

                      mat.start();
                      count++;

                  

            

            System.out.println(count);
         catch (Exception e) 

            e.printStackTrace();
        
    


【讨论】:

【参考方案9】:

使用来自google guava library 的MultiSet 集合。

Multiset<String> wordsMultiset = HashMultiset.create();
Scanner scanner = new Scanner(fileName);
while (scanner.hasNextLine()) 
    wordsMultiset.add(scanner.nextLine());

for(Multiset.Entry<String> entry : wordsMultiset )
     System.out.println("Word : "+entry.getElement()+" count -> "+entry.getCount());

【讨论】:

【参考方案10】:

尝试使用 java.util.Scanner。

public int countWords(String w, String fileName) 
int count = 0;
Scanner scanner = new Scanner(inputFile);
scanner.useDelimiter("[^a-zA-Z]"); // non alphabets act as delimeters
String word = scanner.next();
if (word.equalsIgnoreCase(w))
    count++;
   return count;

【讨论】:

【参考方案11】:

您可以逐行读取文本文件。我假设每一行可以包含多个单词。对于每一行,您调用:

String[] words = line.split(" "); 
for(int i=0; i<words.length; i++)
   if(words[i].equalsIgnoreCase(searhedWord))
         count++;

【讨论】:

【参考方案12】:

Apache Commons - StringUtils.countMatches()

【讨论】:

【参考方案13】:
HashMap h=new HashMap();                        
FileInputStream fin=new FileInputStream("d:\\file.txt");
BufferedReader br=new BufferedReader(new InputStreamReader(fin));
String n;
while((n=br.readLine())!=null)

    if(h.containsKey(n))
    
    int i=(Integer)h.get(n);
    h.put(n,(i+1));
    
    else
    h.put(n, 1);

现在遍历此映射以使用每个单词作为映射值的键来获取每个单词的计数

【讨论】:

以上是关于在文本文件中查找特定单词并计算它的主要内容,如果未能解决你的问题,请参考以下文章

计算特定单词在 C++ 文本文件中出现的次数

PIG 脚本根据特定单词将大型文本文件拆分为多个部分

查找和替换文件中的单词/行

尝试在单独的文本文件中查找单词

如何使用熊猫在特定列中的csv文件中查找特定单词

在 pdf 文件中查找特定文本并使用文本打印文件名和行