python2编码
unicode:unicode 你好 u‘\u4f60\u597d‘
| | | |
encode(‘utf8‘)| |decode(‘utf8‘) encode(‘gbk‘)| |decode(‘gbk‘)
| | | |
utf8 gbk
编码后的str ‘\xe4\xbd\xa0\xe5\xa5\xbd‘ 编码后的gbk u‘\u6d63\u72b2\u30bd‘
# str: bytes
>>> s = ‘你好 world‘ >>> print repr(s) ‘\xe4\xbd\xa0\xe5\xa5\xbd world‘ >>> print len(s) 12 >>> print type(s) <type ‘str‘>
# unicode:unicode
>>> s = u‘你好 world‘ >>> print repr(s) u‘\u4f60\u597d world‘ >>> print len(s) 8 >>> print type(s) <type ‘unicode‘>
#unicode: 无论什么字符在Unicode都有一个对应。
python2的特点
1.在python2中print把字节转成了Unicode
2.python2中以默认已ASCII编码
[[email protected] ~]# cat python.py
#coding:utf8 # 告诉解释器以utf8编码
print ‘你好‘
python3编码
在python3中默认以utf8编码
str:unicode 你好 u‘\u4f60\u597d‘
| | | |
encode(‘utf8‘)| |decode(‘utf8‘) encode(‘gbk‘)| |decode(‘gbk‘)
| | | |
utf8 gbk
编码后的str ‘\xe4\xbd\xa0\xe5\xa5\xbd‘ 编码后的gbk u‘\u6d63\u72b2\u30bd‘
>>> s = ‘你好 world‘ >>> print (json.dumps(s)) "\u4f60\u597d world" >>> print (len(s)) 8 >>> print (type(s)) <class ‘str‘>
编码解码方式1:
>>> s = ‘你好 world‘ >>> b = s.encode(‘utf8‘) >>> print (b) b‘\xe4\xbd\xa0\xe5\xa5\xbd world‘ >>> s = b.decode(‘utf8‘) >>> print (s) 你好 world >>> s = b.decode(‘gbk‘) >>> print (s) 浣犲ソ world
编码解码方式2:
>>> s = ‘你好 world‘ >>> b = bytes(s,‘gbk‘) >>> print (b) b‘\xc4\xe3\xba\xc3 world‘ >>> s = str(b,‘gbk‘) >>> print (s) 你好 world >>> s = ‘你好 world‘ >>> b = bytes(s,‘utf8‘) >>> print (b) b‘\xe4\xbd\xa0\xe5\xa5\xbd world‘ >>> s = str(b,‘utf8‘) >>> print (s) 你好 world >>> s = str(b,‘gbk‘) >>> print (s) 浣犲ソ world