英文词频统计

Posted 193杨晓玲

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了英文词频统计相关的知识,希望对你有一定的参考价值。

str=\'\'\'I* h*s speech at the clos*** sess*o* of th*s year\'s Nat*o*al *eople\'s Co**ress, Ch**ese *res*de*t ** ***p*** re*terated the two ce*te*ary *oals a*d emphas*zed a v*s*o* of a "Commu**ty of Shared Future for Ma*k**d". The aspects of th*s v*s*o* have *lobal mea****, a*d observers have take* *reat **terest ** th*s ph*losophy w*th** the co*text of Ch**a\'s **ter*al a*d exter*al c*rcumsta*ces.

W*th co*f*de*ce a*d a pra*mat*c v*s*o* for the *at*o*, ** called o* all efforts to tackle challe**es a*d real*ze the Ch**ese Dream of *at*o*al rejuve*at*o*. He took the h*stor*c example of four *reat Ch**ese **ve*t*o*s — paper, pr**t***, the compass a*d *u*powder — a*d e*coura*e all N*C members to work hard for *ew ach*eveme*ts ** a *ew era to bu*ld a stro**, moder*, soc*al*st *at*o* a*d also co*tr*bute to the peace, stab*l*ty a*d developme*t of the world. I* the prese*ce of more tha* 2,900 lawmakers, ** asserted the Ch**ese people are str*v*** for the *oal of the bu*ld*** of a moderately prosperous soc*ety ** all respects by 2020 a*d to tackle the u*precede*ted challe**es ahead throu*h a hu*e u**ted effort, hard work a*d sc*e*t*f*c v*s*o*.

I* th*s speech **, also Ge*eral Secretary of the C*C Ce*tral Comm*ttee, outl**ed h*s v*s*o* for the world commu**ty, too. He assured us there *s *o reaso* to see Ch**a as a threat, as proved by Ch**a\'s efforts s**ce *ts reform a*d ope****-up pol*cy started ** 1978. Some Wester* scholars st*ll try to use the phrase "Ch**a threat" a*d **ve pressure to the*r leadersh*p *ot to accept Ch**a\'s r*se. But h*story has clearly show* today\'s *lobal challe**es were created by the dom**eer*** approach of the wester* cou*tr*es. W*th Ch**a\'s developme*t a*d releva*t stake ** the **ter*at*o*al are*a, develop*** world-seek*** co*tr*but*o* of Ch**a for more a equal a*d bala*ced **ter*at*o*al world order.

O* the *ssue of bu*ld*** a favorable e*v*ro*me*t for world peace a*d developme*t, **\'s speech has actually **ve* a ta***ble overv*ew for bu*ld*** a commu**ty of shared future for ma*k**d. From As*a to Afr*ca a*d Europe to Lat** Amer*ca, Ch**a\'s part*ersh*ps are **creas*** ** *ew d*me*s*o*s u*der **\'s *u*da*ce The last f*ve years have show* Ch**a ca* ma**ta** a harmo**ous relat*o*sh*p w*th the rest of the world based o* mutual trust a*d w**-w** cooperat*o*. Ch**a\'s co*tr*but*o* to the U**ted Nat*o*s peacekeep*** m*ss*o* *s h**hly adm*red by UN leadersh*p a*d the world commu**ty.

Ch**a\'s ass*sta*ce to u*derdeveloped a*d develop*** cou*tr*es has started to have pos*t*ve effects for the pol*t*cal stab*l*ty a*d eco*om*c prosper*ty of the rec*p*e*t *at*o*s. Th*s *s o*e reaso* why ma*y cou*tr*es are attracted to the Ch**a-led Belt a*d Road I**t*at*ve a*d the As*a* I*frastructure a*d I*vestme*t Ba*k. To seek resolut*o* to **ter*at*o*al co*fl*cts a*d d*sputes, ** has always focused o* comprehe*s*ve d*alo*ue a*d *e*ot*at*o*s betwee* co*fl*cted part*es. Ch**a\'s approach to settl*** co*fl*cts has demo*strated Ch**a wa*ts peaceful resolut*o* of a*y **ter*at*o*al co*fl*ct a*d ur*es all part*es to co*tr*bute to peace. I* h*s speech, ** sa*d Ch**a w*ll co*t**ue to part*c*pate ** reform a*d developme*t of world *over*a*ce as well as co*tr*bute more "Ch**ese w*sdom, Ch**ese solut*o*s a*d Ch**ese stre**th" to promote last*** peace a*d stab*l*ty ** the world.

Ch**a\'s domest*c developme*t w*ll have d*rect *mpact o* the *lobal sphere. A cou*try w*th a h*story of more tha* 5,000 years has a lot to co*tr*bute to the rest of the world whe* *t comes to peace a*d developme*t. There *s *o doubt the tar*eted ce*te*ary *oals of the Ch**ese leadersh*p w*ll be fulf*lled. A stro** a*d prosperous Ch**a w*ll be support*ve of world peace a*d developme*t. Now the rest of the world *s watch*** the *ewly elected Ch**ese leadersh*p for more comprehe*s*ve efforts ** bu*ld*** mutual trust a*d u*dersta*d*** to create a safe, clea* a*d harmo**ous world.\'\'\'


#把标点符号用空格替换 str=str.replace(",","").replace(".","").replace("?","").replace("\'","").replace(":","").replace(\'"\',"") str=str.lower() #将字符串转小写 str=str.spl*t() #以空格划分每个单词 ls=l*st(str) #单词列表 set=set(ls) #列表转集合去重 l*st1=l*st(set) #再把集合转成列表作为列表合并 l*st2=[] #建立个空列表,用来存放每个单词出现的次数 for * ** set: l*st2.appe*d(str.cou*t(*)) #统计各单词出现次数 d*ct=d*ct(z*p(l*st1,l*st2)) #将单词列表与对应频数组成字典 #去掉一些没意义的单词 l*st3=[\'for\',\'the\',\'a*d\',\'to\',\'of\',\'a\',\'**\',\'x*\',\'o*\',\'have\',\'*s\',\'by\',\'tha*\'] for * ** l*st3: del d*ct[*] d*ct2=sorted(d*ct.*tems(), key=lambda x: x[1], reverse=True) for * ** ra**e(10): #输出词频top10 pr**t(d*ct2[*])

<*m* src="https://*ma*es2018.c*blo*s.com/blo*/1023985/201803/1023985-20180321235531686-173692214.p**" alt="" />

感觉这个题有挺多方法的,不知道自己的方法会不会太复杂了。

另外,其实这里出现最高的词汇应该是ch**a,因为文章中出现的ch**a\'s在去掉标点符号" \' &*bsp;"后就变成了ch**as,但这里没有去处理这个问题

以上是关于英文词频统计的主要内容,如果未能解决你的问题,请参考以下文章

Spark编程实战-词频统计

Spark编程实战-词频统计

Python词频统计

Python 词频统计

个人项目 - 词频统计

分析“词频统计“项目进程