使用python转换编码格式
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用python转换编码格式相关的知识,希望对你有一定的参考价值。
之前有写过一个使用powershell转换文档格式的方法,然而因为powershell支持不是很全,所以并不好用。这里使用python再做一个。
思路
检测源码格式,如果不是utf8,则进行转换,否则跳过
代码
import chardet import sys import codecs def findEncoding(s): file = open(s, mode=‘rb‘) buf = file.read() result = chardet.detect(buf) file.close() return result[‘encoding‘] def convertEncoding(s): encoding = findEncoding(s) if encoding != ‘utf-8‘ and encoding != ‘ascii‘: print("convert %s %s to utf-8" % (s, encoding)) contents = ‘‘ with codecs.open(s, "r", encoding) as sourceFile: contents = sourceFile.read() with codecs.open(s, "w", "utf-8") as targetFile: targetFile.write(contents) else: print("%s encoding is %s ,there is no need to convert" % (s, encoding)) if __name__ == "__main__": if len(sys.argv) != 2: print("error filename") else: convertEncoding(sys.argv[1])
实际测试,可以成功转换。
知识点
- chardet,这个模块是用来检测编码格式的。检测完成之后返回一个dict类型。dict的key又两个,一个是encode,一个是confidence,参数函数顾名思义。
- with as 这个语法很好用,特别是在打开文件的时候,可以处理忘记关闭文件导致文件一直被占用等异常。
批量转换
import chardet import sys import codecs import os def findEncoding(s): file = open(s, mode=‘rb‘) buf = file.read() result = chardet.detect(buf) file.close() return result[‘encoding‘] def convertEncoding(s): if os.access(s,os.W_OK): encoding = findEncoding(s) if encoding != ‘utf-8‘ and encoding != ‘ascii‘: print("convert %s %s to utf-8" % (s, encoding)) contents = ‘‘ with codecs.open(s, "r", encoding) as sourceFile: contents = sourceFile.read() with codecs.open(s, "w", "utf-8") as targetFile: targetFile.write(contents) else: print("%s encoding is %s ,there is no need to convert" % (s, encoding)) else: print("%s read only" %s) def getAllFile(path, suffix=‘.‘): "recursive is enable" f = os.walk(path) fpath = [] for root, dir, fname in f: for name in fname: if name.endswith(suffix): fpath.append(os.path.join(root, name)) return fpath def convertAll(path): fclist = getAllFile(path, ".c") fhlist = getAllFile(path, ".h") flist = fclist + fhlist for fname in flist: convertEncoding(fname) if __name__ == "__main__": path = ‘‘ if len(sys.argv) == 1: path = os.getcwd() elif len(sys.argv) == 2: path = sys.argv[1] else: print("error parameter") exit() convertAll(path)
可以指定目录,也可以在当前目录下用,递归遍历。
知识点
- os.walk,遍历所有文件
- os.access,检查文件属性
以上是关于使用python转换编码格式的主要内容,如果未能解决你的问题,请参考以下文章