词频统计更新

Posted 林莉

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了词频统计更新相关的知识,希望对你有一定的参考价值。

代码有两个分支,1、选择输入文本路径或,2、选择直接输入文章

public static void main(String[] args) {
        HashMap<String,Integer> map=new HashMap<String,Integer>();//用于统计各个单词的个数,排序
        //过滤字符串中的所有标点符号
        String regex=" ?.!:,\"\"‘‘;\n";
        BufferedReader br;
        try {
            //FileReader类创建了一个可以读取文件内容的Reader类、调用构造方法FileReader()
            Scanner scan = new Scanner(System.in);
            System.out.println("请输入您的输入格式");
            System.out.println("1、文件完整路径");
            System.out.println("2、文章内容");
            int flag = scan.nextInt(); 
            

根据不同的选择,进入不同的分支

功能1:小文件输入键盘在控制台下输入命令。

在控制台输入文本路径即可进行词频统计。

 1                     System.out.println("请输入文件完整路径");
 2                     String fileUrl = scan.next();
 3                     br = new BufferedReader(new FileReader(fileUrl));//文件完整路径
 4                     String sentence;
 5                     int wordCount = 0;
 6                     try {
 7                         while((sentence = br.readLine()) !=null){     //用readLine读取文件,判断读取文件是否为空
 8                             sentence = sentence.replaceAll(regex, "");
 9                             StringTokenizer token=new StringTokenizer(sentence);
10                             while(token.hasMoreTokens()){     //循环遍历
11                                 wordCount++;    
12                                 String word = token.nextToken();
13                                 if(map.containsKey(word)){     //HashMap不允许重复的key,所以利用这个特性,去统计单词的个数
14                                 int count=map.get(word);
15                                 map.put(word, count+1);     //如果HashMap已有这个单词,则设置它的数量加1
16                             }
17                             else{
18                                 map.put(word, 1);          //如果没有这个单词,则新填入,数量为1
19                         }
20                     }
21                 }
22                         System.out.println("总共单词数:"+wordCount);
23                         sort(map); 
24                     } catch (IOException e) {
25                         e.printStackTrace();
26                     }
27                     break;

运行结果:

请输入您的输入格式
1、文件完整路径
2、文章内容
1
请输入文件完整路径
c://english.txt
总共单词数:181
as:7
the:7
not:6
it:6
to:5
are:4
a:4
your:4
in:4
they:3
live:3
and:3
of:2
do:2
may:2
by:2
be:2
clothes:2
that:2
often:2
have:2
from:2
above:2
is:2
you:2
door:1
its:1
suppose.It:1
palace.The:1
contentedly:1
snow:1
friends,Turn:1
yourself:1
means.which:1
or:1
windows:1
life,poor:1
bad:1
quiet:1
like:1
without:1
thoughts.:1
simply:1
abode;the:1
change.Sell:1
will:1
some:1
fault-finder:1
herb,like:1
before:1
most:1
I:1
old,return:1
trouble:1
life:1
change;we:1
supported:1
is.You:1
spring.:1
me:1
mind:1
town;but:1
there,and:1
paradise.Love:1
hardnames.It:1
is,meet:1
should:1
seem:1
independent:1
new:1
alms-house:1
poor-house.The:1
pleasant,thrilling,glorious:1
;do:1
garden:1
happens:1
keep:1
but:1
However:1
reflected:1
being:1
brightly:1
enough:1
Cultivate:1
any.May:1
looks:1
more:1
sage.Do:1
town‘s:1
when:1
faults:1
richest.The:1
disreputable.:1
think:1
get:1
so:1
much:1
lives:1
perhaps:1
early:1
things,whether:1
call:1
dishonest:1
sun:1
shun:1
melts:1
setting:1
them.Things:1
poverty:1
poorest:1
mean:1
receive:1
find:1
hourss,even:1
thoughts,as:1
rich:1
poor:1
man‘s:1
cheering:1
great:1
see:1
supporting:1
themselves:1
misgiving.Most:1

功能2. 支持命令行输入英文作品的文件名

>wf english.txt

total 181 words

功能3. 支持命令行输入存储有英文作品文件的目录名,批量统计。
>dir folder
gone_with_the_wand
runbinson
janelove
>wf folder
gone_with_the_wand
total 1234567 words

功能4. 从控制台读入英文单篇作品

                    System.out.println("请输入文章内容");
                    String sentence2 = scan.next();        //将要输入的句子或段落。
                    System.out.println(sentence2);
                    int wordCount2=0;                    //每个单词出现的次数。
                    HashMap<String,Integer> map2=new HashMap<String,Integer>();//用于统计各个单词的个数,排序
                    StringTokenizer token=new StringTokenizer(sentence2);//这个类会将字符串分解成一个个的标记
                    sentence = sentence2.replaceAll(regex, "");
                    while(token.hasMoreTokens()){                      //循环遍历
                        wordCount2++;                                  
                        String word=token.nextToken(", ?.!:\"\"‘‘\n"); //括号里的字符的含义是说按照,空格 ? . : "" ‘‘ \n去分割
                        if(map2.containsKey(word)){     //HashMap不允许重复的key,所以利用这个特性,去统计单词的个数
                            int count=map2.get(word);
                            map2.put(word, count+1);     //如果HashMap已有这个单词,则设置它的数量加1
                        }
                        else
                            map2.put(word, 1);          //如果没有这个单词,则新填入,数量为1
                    }
                    System.out.println("总共单词数:"+wordCount2);
                    sort(map2);                        //调用排序的方法,排序并输出!                    
                
                    break;
            }

运行结果:

  1 请输入您的输入格式
  2 1、文件完整路径
  3 2、文章内容
  4 2
  5 请输入文章内容
  6 However mean your life is,meet it and live it ;do not shun it and call it hardnames.It is not so bad as you suppose.It looks poorest when you are richest.The fault-finder will find faults in paradise.Love your life,poor as it is.You may perhaps have some pleasant,thrilling,glorious hourss,even in a poor-house.The setting sun is reflected from the windows of the alms-house as brightly as from the rich man‘s abode;the snow melts before its door as early in the spring. I do not see but a quiet mind may live as contentedly there,and have as cheering thoughts,as in a palace.The town‘s poor seem to me often to live the most independent lives of any.May be they are simply great enough to receive without misgiving.Most think that they are above being supported by the town;but it often happens that they are not above supporting themselves by dishonest means.which should be more disreputable.Cultivate poverty like a garden herb,like sage.Do not trouble yourself much to get new things,whether clothes or friends,Turn the old,return to them.Things do not change;we change.Sell your clothes and keep your thoughts.
  9 总共单词数:181
 10 as:7
 11 the:7
 12 not:6
 13 it:6
 14 to:5
 15 are:4
 16 a:4
 17 your:4
 18 in:4
 19 they:3
 20 live:3
 21 and:3
 22 of:2
 23 do:2
 24 may:2
 25 by:2
 26 be:2
 27 clothes:2
 28 that:2
 29 often:2
 30 have:2
 31 from:2
 32 above:2
 33 is:2
 34 you:2
 35 door:1
 36 its:1
 37 suppose.It:1
 38 palace.The:1
 39 contentedly:1
 40 snow:1
 41 friends,Turn:1
 42 yourself:1
 43 means.which:1
 44 or:1
 45 windows:1
 46 life,poor:1
 47 bad:1
 48 quiet:1
 49 like:1
 50 without:1
 51 thoughts.:1
 52 simply:1
 53 abode;the:1
 54 change.Sell:1
 55 will:1
 56 some:1
 57 fault-finder:1
 58 herb,like:1
 59 before:1
 60 most:1
 61 I:1
 62 old,return:1
 63 trouble:1
 64 life:1
 65 change;we:1
 66 supported:1
 67 is.You:1
 68 spring.:1
 69 me:1
 70 mind:1
 71 town;but:1
 72 there,and:1
 73 paradise.Love:1
 74 hardnames.It:1
 75 is,meet:1
 76 should:1
 77 seem:1
 78 independent:1
 79 new:1
 80 alms-house:1
 81 poor-house.The:1
 82 pleasant,thrilling,glorious:1
 83 ;do:1
 84 garden:1
 85 happens:1
 86 keep:1
 87 but:1
 88 However:1
 89 reflected:1
 90 being:1
 91 brightly:1
 92 enough:1
 93 Cultivate:1
 94 any.May:1
 95 looks:1
 96 more:1
 97 sage.Do:1
 98 town‘s:1
 99 when:1
100 faults:1
101 richest.The:1
102 disreputable.:1
103 think:1
104 get:1
105 so:1
106 much:1
107 lives:1
108 perhaps:1
109 early:1
110 things,whether:1
111 call:1
112 dishonest:1
113 sun:1
114 shun:1
115 melts:1
116 setting:1
117 them.Things:1
118 poverty:1
119 poorest:1
120 mean:1
121 receive:1
122 find:1
123 hourss,even:1
124 thoughts,as:1
125 rich:1
126 poor:1
127 man‘s:1
128 cheering:1
129 great:1
130 see:1
131 supporting:1
132 themselves:1
133 misgiving.Most:1

 

以上是关于词频统计更新的主要内容,如果未能解决你的问题,请参考以下文章

第二周-词频统计更新

词频统计更新

2nd 词频统计更新

week2 词频统计第一次更新

Spark编程实战-词频统计

Spark编程实战-词频统计