456python string 类内容(去除文本标点)

Posted alex_bn_lee

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了456python string 类内容(去除文本标点)相关的知识,希望对你有一定的参考价值。

主要用于 NLP 处理,里面存在一些常量列表,包括数字、字母、大写字母、小写字母、标点符号、空格等。

参考:6.1. string — Common string operations

可以用于删除文本中的标点符号,将标点符号 replace 为 空。

>>> import string
>>> string.punctuation
‘!"#$%&‘()*+,-./:;<=>?@[\]^_`{|}~‘
>>> string.digits
‘0123456789‘
>>> string.ascii_letters
‘abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ‘
>>> string.ascii_lowercase
‘abcdefghijklmnopqrstuvwxyz‘
>>> string.ascii_uppercase
‘ABCDEFGHIJKLMNOPQRSTUVWXYZ‘
>>> string.hexdigits
‘0123456789abcdefABCDEF‘
>>> string.printable
‘0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&‘()*+,-./:;<=>?@[\]^_`{|}~ 	

x0bx0c‘
>>> string.whitespace
‘ 	

x0bx0c‘

6.1.1. String constants

The constants defined in this module are:

string.ascii_letters

The concatenation of the ascii_lowercase and ascii_uppercase constants described below. This value is not locale-dependent.

string.ascii_lowercase

The lowercase letters ‘abcdefghijklmnopqrstuvwxyz‘. This value is not locale-dependent and will not change.

string.ascii_uppercase

The uppercase letters ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ‘. This value is not locale-dependent and will not change.

string.digits

The string ‘0123456789‘.

string.hexdigits

The string ‘0123456789abcdefABCDEF‘.

string.octdigits

The string ‘01234567‘.

string.punctuation

String of ASCII characters which are considered punctuation characters in the C locale.

string.printable

String of ASCII characters which are considered printable. This is a combination of digits, ascii_letters, punctuation, and whitespace.

string.whitespace

A string containing all ASCII characters that are considered whitespace. This includes the characters space, tab, linefeed, return, formfeed, and vertical tab.

以上是关于456python string 类内容(去除文本标点)的主要内容,如果未能解决你的问题,请参考以下文章

去除String首尾字符

NLP情感分析和可视化|python实现评论内容的文本清洗语料库分词去除停用词建立TF-IDF矩阵获取主题词和主题词团

Python:从文本数据中剥离 html

python文本去除表情符

java去除文本内容的标签跟 

PHP将富文本编辑后的内容,去除样式图片等只保留txt文本内容