)中各个单词出现的频率,并且把频率最高的10词打印出来

Posted Anyanyamy

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了)中各个单词出现的频率,并且把频率最高的10词打印出来相关的知识,希望对你有一定的参考价值。

写一个程序,分析一个文本文件(英文文章)中各个单词出现的频率,并且把频率最高的10词打印出来

  在看到这个问题的时候,我决定用c语言来进行编写,并且将这个问题进行了分解.

  原本想的是记下来每个单词出现的频率后再排序,这样会用到O(N*log N),后来才发现,想复杂了,直接保存频率最高的几个单词,遍历的同时替换就可以了。。降到了O(N)。

  1.首先是要利用C语言读取文件,然后读取一个个的单词,并进行统计它们出现的频率

    在这个过程中,为了防止标点符号也被读入到单词中,导致单词的数据统计有误,所以采用了一个字符一个字符读取的方法,同时将字符限定在'a--z','A--Z'之间,这样单词的统计以及大小写就不会出错了。

void readfile(struct word*&head)

    FILE *fp;
    if((fp=fopen("in.txt","r"))==NULL)
     
        printf("无法打开此文件!\\n");
        exit(0);
    
    char ch,temp[30];
    struct word *p;
    while(!feof(fp))
    
        int i=0;
        ch=fgetc(fp);
        temp[0]=' '; 
        while((ch>='a'&&ch<='z')||(ch>='A'&&ch<='Z')||temp[0]==' ')
        
            if(ch>='a'&&ch<='z'||ch>='A'&&ch<='Z')
            
                temp[i]=ch;
                i++;
            
            ch=fgetc(fp);
            if(feof(fp)) break;
        
        temp[i]='\\0';
        p=head->next;
        while(p)
        
            if(!_stricmp(temp,p->name)) 
             
                p->num++;break;
             
            p=p->next;
        
        if(!p&&temp[0]!='\\0')
        
                p=new word;
                strcpy(p->name,temp);
                p->num=1;
                p->next=head->next;
                head->next=p;
        
    

 2.然后就是对单词进行排序,选出十个频率最高的单词,并打印出来

      在排序问题上,因为要保持单词与频率的一致性,所以并没有采用冒泡法来进行排列,而是用频率与数组中存储的十个单词频率进行比较,如果比数组中的大,则将其列入数组,并将数组中本来的那个置为零,这样每次都可以只取最大的那个。

void sort(struct word*&head)
    
    struct word *q;
    int a[10],i;
    for(i=0;i<10;i++)
        a[i]=0;
    printf("文章中出现频率最高的十个单词如下:\\n");
    for(i=0;i<10;i++)
     
        q=head;
        while(q!=NULL)
         
            if(q->num>a[i])
                a[i]=q->num;
            else
                q=q->next;
         
        q=head;
        while(q!=NULL)
         
            if(a[i]==q->num)
             
                q->num=0;
                printf("出现频率:%d\\t",a[i]);
                puts(q->name);
                break;
             
            else 
                q=q->next;
        
    


  以下是程序的源代码:

#include<stdio.h>
#include<stdlib.h>
#include<ctype.h>
#include<string.h>
#include<fstream.h>
/***********************单词结构体***********************/
struct word
    char name[30];
    int num;
    struct word *next;
;
/**********************读取单词并统计出现频率*********************/
void readfile(struct word*&head)

    FILE *fp;
    if((fp=fopen("in.txt","r"))==NULL)
     
        printf("无法打开此文件!\\n");
        exit(0);
    
    char ch,temp[30];
    struct word *p;
    while(!feof(fp))
    
        int i=0;
        ch=fgetc(fp);
        temp[0]=' '; 
        while((ch>='a'&&ch<='z')||(ch>='A'&&ch<='Z')||temp[0]==' ')
        
            if(ch>='a'&&ch<='z'||ch>='A'&&ch<='Z')
            
                temp[i]=ch;
                i++;
            
            ch=fgetc(fp);
            if(feof(fp)) break;
        
        temp[i]='\\0';
        p=head->next;
        while(p)
        
            if(!_stricmp(temp,p->name)) 
             
                p->num++;break;
             
            p=p->next;
        
        if(!p&&temp[0]!='\\0')
        
                p=new word;
                strcpy(p->name,temp);
                p->num=1;
                p->next=head->next;
                head->next=p;
        
    

/****************************排序***********************/
void sort(struct word*&head)
    
    struct word *q;
    int a[10],i;
    for(i=0;i<10;i++)
        a[i]=0;
    printf("文章中出现频率最高的十个单词如下:\\n");
    for(i=0;i<10;i++)
     
        q=head;
        while(q!=NULL)
         
            if(q->num>a[i])
                a[i]=q->num;
            else
                q=q->next;
         
        q=head;
        while(q!=NULL)
         
            if(a[i]==q->num)
             
                q->num=0;
                printf("出现频率:%d\\t",a[i]);
                puts(q->name);
                break;
             
            else 
                q=q->next;
        
    

/*****************************主函数****************************/
void main()

    struct word *head;
    head=new word;
    head->next=NULL;
    readfile(head);
    sort(head);





        

以下是进行调试时用到的文章和结果:

My father was a self-taught mandolin player. He was one of the best string instrument players in our town. He could not read music, but if he heard a tune a few times, he could play it. When he was younger, he was a member of a small country music band. They would play at local dances and on a few occasions would play for the local radio station. He often told us how he had auditioned and earned a position in a band that featured Patsy Cline as their lead singer. He told the family that after he was hired he never went back. Dad was a very religious man. He stated that there was a lot of drinking and cursing the day of his audition and he did not want to be around that type of environment.

Occasionally, Dad would get out his mandolin and play for the family. We three children: Trisha, Monte and I, George Jr., would often sing along. Songs such as the Tennessee Waltz, Harbor Lights and around Christmas time, the well-known rendition of Silver Bells. "Silver Bells, Silver Bells, its Christmas time in the city" would ring throughout the house. One of Dad's favorite hymns was "The Old Rugged Cross". We learned the words to the hymn when we were very young, and would sing it with Dad when he would play and sing. Another song that was often shared in our house was a song that accompanied the Walt Disney series: Davey Crockett. Dad only had to hear the song twice before he learned it well enough to play it. "Davey, Davey Crockett, King of the Wild Frontier" was a favorite song for the family. He knew we enjoyed the song and the program and would often get out the mandolin after the program was over. I could never get over how he could play the songs so well after only hearing them a few times. I loved to sing, but I never learned how to play the mandolin. This is something I regret to this day.

Dad loved to play the mandolin for his family he knew we enjoyed singing, and hearing him play. He was like that. If he could give pleasure to others, he would, especially his family. He was always there, sacrificing his time and efforts to see that his family had enough in their life. I had to mature into a man and have children of my own before I realized how much he had sacrificed.

I joined the United States Air Force in January of 1962. Whenever I would come home on leave, I would ask Dad to play the mandolin. Nobody played the mandolin like my father. He could touch your soul with the tones that came out of that old mandolin. He seemed to shine when he was playing. You could see his pride in his ability to play so well for his family.

When Dad was younger, he worked for his father on the farm. His father was a farmer and sharecropped a farm for the man who owned the property. In 1950, our family moved from the farm. Dad had gained employment at the local limestone quarry. When the quarry closed in August of 1957, he had to seek other employment. He worked for Owens Yacht Company in Dundalk, Maryland and for Todd Steel in Point of Rocks, Maryland. While working at Todd Steel, he was involved in an accident. His job was to roll angle iron onto a conveyor so that the welders farther up the production line would have it to complete their job. On this particular day Dad got the third index finger of his left hand mashed between two pieces of steel. The doctor who operated on the finger could not save it, and Dad ended up having the tip of the finger amputated. He didn't lose enough of the finger where it would stop him picking up anything, but it did impact his ability to play the mandolin.

After the accident, Dad was reluctant to play the mandolin. He felt that he could not play as well as he had before the accident. When I came home on leave and asked him to play he would make excuses for why he couldn't play. Eventually, we would wear him down and he would say "Okay, but remember, I can't hold down on the strings the way I used to" or "Since the accident to this finger I can't play as good". For the family it didn't make any difference that Dad couldn't play as well. We were just glad that he would play. When he played the old mandolin it would carry us back to a cheerful, happier time in our lives. "Davey, Davey Crockett, King of the Wild Frontier", would again be heard in the little town of Bakerton, West Virginia.

In August of 1993 my father was diagnosed with inoperable lung cancer. He chose not to receive chemotherapy treatments so that he could live out the rest of his life in dignity. About a week before his death, we asked Dad if he would play the mandolin for us. He made excuses but said "okay". He knew it would probably be the last time he would play for us. He tuned up the old mandolin and played a few notes. When I looked around, there was not a dry eye in the family. We saw before us a quiet humble man with an inner strength that comes from knowing God, and living with him in one's life. Dad would never play the mandolin for us again. We felt at the time that he wouldn't have enough strength to play, and that makes the memory of that day even stronger. Dad was doing something he had done all his life, giving. As sick as he was, he was still pleasing others. Dad sure could play that Mandolin!

调试结果:

以上是关于)中各个单词出现的频率,并且把频率最高的10词打印出来的主要内容,如果未能解决你的问题,请参考以下文章

HDU 1298 T9 ( 字典树 )

python 从字典中找到出现频率高的单词

单词中出现频率最高的字符串

python输入一段英文文本,统计出现频率最高的前5个单词?

Linux作业-shell统计某文章中出现频率最高的N个单词并排序输出出现次数

用 Pandas DataFrame 中出现频率最高的单词替换单元格