python 字数统计
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 字数统计相关的知识,希望对你有一定的参考价值。
#!/usr/bin/python
# -*- coding: utf-8 -*-
# 分别统计中文和英文的字数,不包括标点符号。
# Author: Pan Junyong from zopen.cn, panjy at zopen dot cn
import re
import sys
from types import StringType
import operator
import urllib2
# See CJKSplitter
rx = re.compile(u"[a-zA-Z0-9_\u0392-\u03c9]+|[\u4E00-\u9FFF\u3400-\u4dbf\uf900-\ufaff\u3040-\u309f\uac00-\ud7af]+", re.UNICODE)
def caculateWords(s, encoding='utf-8'):
result = []
if type(s) is StringType: # not unicode
s = unicode(s, encoding, 'ignore')
splitted = rx.findall(s)
cjk_len = 0
asc_len = 0
for w in splitted:
if ord(w[0]) >= 12352: # \u3040
cjk_len += len(w)
# result.append(w)
else:
#result.append(w)
asc_len += 1
return (cjk_len, asc_len)
def main():
index=0
total_words = (0, 0)
for filename in sys.argv[1:]:
s = open(filename).read()
# TODO: check encoding
words = caculateWords(s)
index += 1
total_words = map(operator.add, total_words, words)
print "%2d" % index, filename.ljust(18), '(Chinese, English):', words
print "total: %2d files, " % index, '(Chinese, English):', tuple(total_words)
def get_words_count(url):
url = "http://infoqhelp.sinaapp.com/queryit?url="+url
print url
data = urllib2.urlopen(url).read()
return caculateWords(data)[0]
if __name__ == '__main__':
print get_words_count("http://www.infoq.com/cn/news/2013/11/yourkit-2013")
以上是关于python 字数统计的主要内容,如果未能解决你的问题,请参考以下文章
python 字数统计
python 哈姆雷特 字数统计 词云
Python进阶用 Python 统计字数
统计史记的字数
textarea还剩余字数统计,支持复制粘贴的时候统计字数
如何让Pages文稿显示字数统计?