python中,可以把多行代码简化为一行,把for循环和if条件判断都集中到一行里来写,示例如下:
>>> from nltk.corpus import stopwords >>> english_stopwords = stopwords.words(‘english‘)#加载nltk中的英文停用词数据
#创建一个列表,内含3个单词列表 >>> texts_tokenized = [[‘writing‘, ‘ii‘, ‘rhetorical‘, ‘composing‘, ‘rhetorical‘, ‘composing‘],[‘engages‘, ‘series‘, ‘interactive‘, ‘reading‘],[‘research‘, ‘composing‘, ‘activities‘, ‘along‘, ‘assignments‘, ‘designed‘, ‘help‘]]
#用多行代码对texts_tokenized去停用词 >>> text_filtered_stopwords = [[word for word in document if not word in english_stopwords] for document in texts_tokenized] >>> text_filtered_stopwords [[‘writing‘, ‘ii‘, ‘rhetorical‘, ‘composing‘, ‘rhetorical‘, ‘composing‘], [‘engages‘, ‘series‘, ‘interactive‘, ‘reading‘], [‘research‘, ‘composing‘, ‘activities‘, ‘along‘, ‘assignments‘, ‘designed‘, ‘help‘]]
然后改成用多行的常规写法:
>>> texts_tokenized = [[‘writing‘, ‘ii‘, ‘rhetorical‘, ‘composing‘, ‘rhetorical‘, ‘composing‘],[‘engages‘, ‘series‘, ‘interactive‘, ‘reading‘],[‘research‘, ‘composing‘, ‘activities‘, ‘along‘, ‘assignments‘, ‘designed‘, ‘help‘]] >>> documents = [] >>> texts_filtered_stopwords =[] >>> for document in texts_tokenized: for word in document: if word not in english_stopwords: documents.append(word) texts_filtered_stopwords.append(document) >>> texts_filtered_stopwords [[‘writing‘, ‘ii‘, ‘rhetorical‘, ‘composing‘, ‘rhetorical‘, ‘composing‘], [‘engages‘, ‘series‘, ‘interactive‘, ‘reading‘], [‘research‘, ‘composing‘, ‘activities‘, ‘along‘, ‘assignments‘, ‘designed‘, ‘help‘]]
可以看到得出一样的结果,但是代码的效率和简洁程度大大提升