在文本文件中查找特定单词并计算它
Posted
技术标签:
【中文标题】在文本文件中查找特定单词并计算它【英文标题】:Find specific word in text file and count it 【发布时间】:2012-10-13 08:30:10 【问题描述】:有人可以帮我写代码吗? 如何在文本文件中搜索任何单词并计算它重复了多少?
例如test.txt:
hi
hola
hey
hi
bye
hoola
hi
如果我想知道在 test.txt 中重复了多少次单词“Hi”,程序必须说“重复 3 次”
我希望你明白我想要什么,谢谢你的回答。
【问题讨论】:
你试过什么?我相信您至少可以打开文件并阅读其中的行... 看看***.com/questions/5102044/… 【参考方案1】:public class Wordcount
public static void main(String[] args)
int count=0;
String str="hi this is is is line";
String []s1=str.split(" ");
for(int i=0;i<=s1.length-1;i++)
if(s1[i].equals("is"))
count++;
System.out.println(count);
【讨论】:
嗨,欢迎来到 SO,为旧问题发布新的、更新的解决方案总是好的,但请尝试使这些答案尽可能地提供信息和清晰。尝试在您的代码中添加描述并确保其格式正确。也请尽量避免无用的 cmets。 这个答案如何为 3 岁的帖子增加价值?这里还有其他类似的答案。【参考方案2】:public int occurrencesOfHi()
String newText = Text.replace("Hi","");
return (Text.length() - newText.length())/2;
【讨论】:
考虑在你的答案中加入一些 cmets。【参考方案3】:package somePackage;
public static void main(String[] args)
String path = ""; //ADD YOUR PATH HERE
String fileName = "test2.txt";
String testWord = "Macbeth"; //CHANGE THIS IF YOU WANT
int tLen = testWord.length();
int wordCntr = 0;
String file = path + fileName;
boolean check;
try
FileInputStream fstream = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while((strLine = br.readLine()) != null)
//check to see whether testWord occurs at least once in the line of text
check = strLine.toLowerCase().contains(testWord.toLowerCase());
if(check)
//get the line, and parse its words into a String array
String[] lineWords = strLine.split("\\s+");
for(String w : lineWords)
//first see if the word is as least as long as the testWord
if(w.length() >= tLen)
/*
1) grab the specific word, minus whitespace
2) check to see whether the first part of it having same length
as testWord is equivalent to testWord, ignoring case
*/
String word = w.substring(0,tLen).trim();
if(word.equalsIgnoreCase(testWord))
wordCntr++;
System.out.println("total is: " + wordCntr);
//Close the input stream
br.close();
catch(Exception e)
e.printStackTrace();
【讨论】:
我抓取了 Macbeth 的文本并将其存储在一个名为 text2.txt 的文件中【参考方案4】:package com.test;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.Scanner;
public class Test
public static void main(String[] args) throws Exception
BufferedReader bf= new BufferedReader(new FileReader("src/test.txt"));
Scanner sc = new Scanner(System.in);
String W=sc.next();
//String regex ="[\\w"+W+"]";
int count=0;
//Pattern p = Pattern.compile();
String line=bf.readLine();
String s[];
do
s=line.split(" ");
for(String a:s)
if(a.contains(W))
count++;
line=bf.readLine();
while(line!=null);
System.out.println(count);
【讨论】:
【参考方案5】:public int countWord(String word, File file)
int count = 0;
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine())
String nextToken = scanner.next();
if (nextToken.equalsIgnoreCase(word))
count++;
return count;
【讨论】:
如果文件在行尾包含空格,则会出现异常。 当单词在点之后时会失败。例如。意大利是国家。意大利是个好地方。 italy 后面的点使它成为一个完整的单词“.Italy”,所以意大利的计数将给出 1【参考方案6】:import java.io.*;
import java.util.*;
class filedemo
public static void main(String ar[])throws Exception
BufferedReader br=new BufferedReader(new FileReader("c:/file.txt"));
System.out.println("enter the string which you search");
Scanner ob=new Scanner(System.in);
String str=ob.next();
String str1="",str2="";
int count=0;
while((str1=br.readLine())!=null)
str2 +=str1;
int index = str2.indexOf(str);
while (index != -1)
count++;
str2 = str2.substring(index + 1);
index = str2.indexOf(str);
System.out.println("Number of the occures="+count);
【讨论】:
【参考方案7】:package File1;
import java.io.BufferedReader;
import java.io.FileReader;
public class CountLineWordsDuplicateWords
public static void main(String[] args)
FileReader fr = null;
BufferedReader br =null;
String [] stringArray;
int counLine = 0;
int arrayLength ;
String s="";
String stringLine="";
try
fr = new FileReader("F:/Line.txt");
br = new BufferedReader(fr);
while((s = br.readLine()) != null)
stringLine = stringLine + s;
stringLine = stringLine + " ";/*Add space*/
counLine ++;
System.out.println(stringLine);
stringArray = stringLine.split(" ");
arrayLength = stringArray.length;
System.out.println("The number of Words is "+arrayLength);
/*Duplicate String count code */
for (int i = 0; i < arrayLength; i++)
int c = 1 ;
for (int j = i+1; j < arrayLength; j++)
if(stringArray[i].equalsIgnoreCase(stringArray[j]))
c++;
for (int j2 = j; j2 < arrayLength; j2++)
stringArray[j2] = stringArray[j2+1];
arrayLength = arrayLength - 1;
//End of If block
//End of Inner for block
System.out.println("The "+stringArray[i]+" present "+c+" times .");
//End of Outer for block
System.out.println("The number of Line is "+counLine);
System.out.println();
fr.close();
br.close();
catch (Exception e)
e.printStackTrace();
//End of main() method
//End of class CountLineWordsDuplicateWords
【讨论】:
【参考方案8】:用Pattern
和Matcher
试试这个方法。
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Dem
public static void main(String[] args)
try
File f = new File("d://My.txt");
FileReader fr = new FileReader(f);
BufferedReader br = new BufferedReader(fr);
String s = new String();
while((s=br.readLine())!=null)
s = s + s;
int count = 0;
Pattern pat = Pattern.compile("it*");
Matcher mat = pat.matcher(s);
while(mat.find())
if(mat.find())
mat.start();
count++;
System.out.println(count);
catch (Exception e)
e.printStackTrace();
【讨论】:
【参考方案9】:使用来自google guava library 的MultiSet
集合。
Multiset<String> wordsMultiset = HashMultiset.create();
Scanner scanner = new Scanner(fileName);
while (scanner.hasNextLine())
wordsMultiset.add(scanner.nextLine());
for(Multiset.Entry<String> entry : wordsMultiset )
System.out.println("Word : "+entry.getElement()+" count -> "+entry.getCount());
【讨论】:
【参考方案10】:尝试使用 java.util.Scanner。
public int countWords(String w, String fileName)
int count = 0;
Scanner scanner = new Scanner(inputFile);
scanner.useDelimiter("[^a-zA-Z]"); // non alphabets act as delimeters
String word = scanner.next();
if (word.equalsIgnoreCase(w))
count++;
return count;
【讨论】:
【参考方案11】:您可以逐行读取文本文件。我假设每一行可以包含多个单词。对于每一行,您调用:
String[] words = line.split(" ");
for(int i=0; i<words.length; i++)
if(words[i].equalsIgnoreCase(searhedWord))
count++;
【讨论】:
【参考方案12】:Apache Commons - StringUtils.countMatches()
【讨论】:
【参考方案13】:HashMap h=new HashMap();
FileInputStream fin=new FileInputStream("d:\\file.txt");
BufferedReader br=new BufferedReader(new InputStreamReader(fin));
String n;
while((n=br.readLine())!=null)
if(h.containsKey(n))
int i=(Integer)h.get(n);
h.put(n,(i+1));
else
h.put(n, 1);
现在遍历此映射以使用每个单词作为映射值的键来获取每个单词的计数
【讨论】:
以上是关于在文本文件中查找特定单词并计算它的主要内容,如果未能解决你的问题,请参考以下文章