中英文词频
Posted pmam
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了中英文词频相关的知识,希望对你有一定的参考价值。
str2=‘‘‘I will run, I will climb, I will soar I‘m undefeated Jumpiing out of my skin, pull the chord Yeah I believe it The past, is everything we were don‘t make us who we are So I‘ll dream, until I make it real, and all I see is stars Its not until you fall that you fly When your dreams come alive you‘re unstoppable Take a shot, chase the sun, find the beautiful We will glow in the dark turning dust to gold And we‘ll dream it possible possible And we‘ll dream it possible I will chase, I will reach, I will fly Until I‘m breaking, until I‘m breaking Out of my cage, like a bird in the night I know I‘m changing, I know I‘m changing In, into something big, better than before And if it takes, takes a thousand lives Then it‘s worth fighting for Its not until you fall that you fly When your dreams come alive you‘re unstoppable Take a shot, chase the sun, find the beautiful We will glow in the dark turning dust to gold And we‘ll dream it possible it possible From the bottom to the top We‘re sparking wild fire‘s Never quit and never stop The rest of our lives From the bottom to the top We‘re sparking wild fire‘s Never quit and never stop Its not until you fall that you fly When your dreams come alive you‘re unstoppable Take a shot, chase the sun, find the beautiful We will glow in the dark turning dust to gold And we‘ll dream it possible possible And we‘ll dream it possible‘‘‘.lower() #aa = ‘‘‘."?!‘‘‘ #for word in aa: # str2 =str2.replace(‘word‘,‘‘) str2 =str2.replace(‘ ‘,‘ ‘) str2 =str2.replace(‘,‘,‘ ‘) print(str2)#去除特殊符号 str2 = str2.strip()#去掉首尾空格 str2 = str2.split()#通过指定分隔符对字符串进行切片 print(str2) print(‘统计每个单词出现的次数为:‘) for word in str2: print(word,str2.count(word)) strSet=set(str2) newSet={‘a‘,‘will‘,‘it‘,‘out‘,‘of‘,‘my‘,‘the‘,‘i‘,‘in‘,‘to‘,‘when‘,‘and‘} strSet1=strSet-newSet#去除介词和其他 print(strSet1) strdict={} #单词计数字典 for word in strSet1: strdict[word] = str2.count(word) print(len(strdict),strdict) strList = list(strdict.items()) def takesecond(elem):#定义函数 return elem[1] #strList.sort(key=lambda x:x[1],reverse=True)#匿名函数 strList.sort(key=takesecond,reverse=True)#按照数值大小进行排序 print(strList) for i in range(20): print (strList[i])#前二十
import jieba f=open(‘《活着》.txt‘,‘r‘,encoding=‘utf-8‘) life=f.read() f.close() lifelist=list(jieba.cut(life)) lifedict={} for word in lifelist: if len(word)==1: continue else: lifedict[word]=lifedict.get(word,0)+1 wordlist=list(lifedict.items()) wordlist.sort(key=lambda x:x[1],reverse=True) for a in range(15): print(wordlist[a])