文件处理

Posted 2020-09-30 Dear坏小子

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了文件处理相关的知识，希望对你有一定的参考价值。

一操作文件

操作文件包括了文件的读、写和关闭，首先来谈谈打开方式：当我们执行 文件句柄 = open(‘文件路径‘, ‘模式‘) 操作的时候，要传递给open方法一个表示模式的参数：

打开文件的模式有：

r，只读模式（默认）。
w，只写模式。【不可读；不存在则创建；存在则删除内容；】
a，追加模式。【可读；不存在则创建；存在则只追加内容；】

"+" 表示可以同时读写某个文件

r+，可读写文件。【可读；可写；可追加】
w+，先写再读。【这个方法打开文件会清空原本文件中的所有内容，将新的内容写进去，之后也可读取已经写入的内容】
a+，同a

"U"表示在读取时，可以将 \r \n \r\n自动转换成 \n （注意：只能与 r 或 r+ 模式同使用）

rU
r+U
rbU
rb+U

"b"表示处理二进制文件（如：FTP发送上传ISO镜像文件，linux可忽略，windows处理二进制文件时需标注）

以下是file操作的源码解析：

1 class file(object):
  2 
  3       def close(self): # real signature unknown; restored from __doc__
  4         关闭文件
  5 
  6         """close() -> None or (perhaps) an integer.  Close the file.
  7        
  8         Sets data attribute .closed to True.  A closed file cannot be used for
  9         further I/O operations.  close() may be called more than once without
 10         error.  Some kinds of file objects (for example, opened by popen())
 11         may return an exit status upon closing.
 12         """
 13  
 14      def fileno(self): # real signature unknown; restored from __doc__
 15         文件描述符   
 16 
 17          """fileno() -> integer "file descriptor".
 18         
 19         This is needed for lower-level file interfaces, such os.read(). """
 20         
 21         return 0    
 22 
 23     def flush(self): # real signature unknown; restored from __doc__
 24         刷新文件内部缓冲区
 25         
 26         """ flush() -> None.  Flush the internal I/O buffer. """
 27 
 28         pass
 29 
 30     def isatty(self): # real signature unknown; restored from __doc__
 31         判断文件是否是同意tty设备
 32 
 33         """ isatty() -> true or false.  True if the file is connected to a tty device. """
 34 
 35         return False
 36 
 37     def next(self): # real signature unknown; restored from __doc__
 38         获取下一行数据，不存在，则报错
 39 
 40         """ x.next() -> the next value, or raise StopIteration """
 41 
 42         pass
 43 
 44  
 45 
 46     def read(self, size=None): # real signature unknown; restored from __doc__
 47         读取指定字节数据
 48 
 49         """read([size]) -> read at most size bytes, returned as a string.
 50       
 51         If the size argument is negative or omitted, read until EOF is reached.
 52         Notice that when in non-blocking mode, less data than what was requested
 53         may be returned, even if no size parameter was given."""
 54 
 55         pass
 56 
 57     def readinto(self): # real signature unknown; restored from __doc__
 58         读取到缓冲区，不要用，将被遗弃
 59 
 60         """ readinto() -> Undocumented.  Don‘t use this; it may go away. """
 61 
 62         pass
 63 
 64  
 65     def readline(self, size=None): # real signature unknown; restored from __doc__
 66         仅读取一行数据
 67         """readline([size]) -> next line from the file, as a string.
 68     
 69         Retain newline.  A non-negative size argument limits the maximum
 70         number of bytes to return (an incomplete line may be returned then).
 71         Return an empty string at EOF. """
 72 
 73         pass
 74 
 75     def readlines(self, size=None): # real signature unknown; restored from __doc__
 76         读取所有数据，并根据换行保存值列表
 77 
 78         """readlines([size]) -> list of strings, each a line from the file.         
 79 
 80         Call readline() repeatedly and return a list of the lines so read.
 81         The optional size argument, if given, is an approximate bound on the
 82         total number of bytes in the lines returned. """
 83 
 84         return []
 85 
 86  
 87 
 88     def seek(self, offset, whence=None): # real signature unknown; restored from __doc__
 89         指定文件中指针位置
 90         """seek(offset[, whence]) -> None.  Move to new file position.
 91        
 92         Argument offset is a byte count.  Optional argument whence defaults to
 93         0 (offset from start of file, offset should be >= 0); other values are 1
 94         (move relative to current position, positive or negative), and 2 (move
 95         relative to end of file, usually negative, although many platforms allow
 96         seeking beyond the end of a file).  If the file is opened in text mode,
 97         only offsets returned by tell() are legal.  Use of other offsets causes
 98         undefined behavior.
 99         Note that not all file objects are seekable. """
100 
101         pass
102 
103  
104 
105     def tell(self): # real signature unknown; restored from __doc__
106         获取当前指针位置
107 
108         """ tell() -> current file position, an integer (may be a long integer). """
109         pass
110 
111 
112     def truncate(self, size=None): # real signature unknown; restored from __doc__
113         截断数据，仅保留指定之前数据
114 
115         """ truncate([size]) -> None.  Truncate the file to at most size bytes.
116 
117         Size defaults to the current file position, as returned by tell().“""
118 
119         pass
120 
121  
122 
123     def write(self, p_str): # real signature unknown; restored from __doc__
124         写内容
125 
126         """write(str) -> None.  Write string str to file.
127        
128         Note that due to buffering, flush() or close() may be needed before
129         the file on disk reflects the data written."""
130 
131         pass
132 
133     def writelines(self, sequence_of_strings): # real signature unknown; restored from __doc__
134         将一个字符串列表写入文件
135         """writelines(sequence_of_strings) -> None.  Write the strings to the file.
136 
137          Note that newlines are not added.  The sequence can be any iterable object
138          producing strings. This is equivalent to calling write() for each string. """
139 
140         pass
141 
142  
143 
144     def xreadlines(self): # real signature unknown; restored from __doc__
145         可用于逐行读取文件，非全部
146 
147         """xreadlines() -> returns self.
148        
149         For backward compatibility. File objects now include the performance
150         optimizations previously implemented in the xreadlines module. """
151 
152         pass

View Code

obj1 = open(‘filetest.txt‘,‘w+‘)

obj1.write(‘I heard the echo, from the valleys and the heart\n‘)
obj1.writelines([‘Open to the lonely soul of sickle harvesting\n‘,
                 ‘Repeat outrightly, but also repeat the well-being of\n‘,
                 ‘Eventually swaying in the desert oasis‘])
obj1.seek(0)
print obj1.readline()
print obj1.tell()
print obj1.readlines()
obj1.close()
以‘w+’的打开方式为例，write是向文件中写入一个字符串，而writelines是想文件中写入一个字符串数组。

写文件操作

write,writelines，相比于那些五花八门的读方法，写方法就单纯的多了，只有wite和writelines两种。看下面的例子和写入的结果，其实write方法和writelines方法都差不多，只不过一个接受的参数是list格式，一个接受的参数是字符串格式而已。这里使用的时候要注意换行符。

1 obj1 = open(‘E:\PythonL\\11-8\\filetest.txt‘,‘r‘)
2 obj1 = open(‘filetest.txt‘,‘w+‘)
3 obj1.write(‘I heard the echo, from the valleys and the heart\nOpen to the lonely soul of sickle harvesting\n‘)
4 obj1.writelines([
5                  ‘Repeat outrightly, but also repeat the well-being of\n‘,
6                  ‘Eventually swaying in the desert oasis‘
7                  ])

View Code

读文件操作

我们以上面这个文件为例，来说说读文件：

首先来看一下直接读取文件中所有内容的方法read和readlines，从下面的结果来看就知道这两种方法一个返回列表，一个是返回字符串，和上面的write方法相对应：

1 #readline方法
2 obj1 = open(‘E:\PythonL\\11-8\\filetest.txt‘,‘r‘)
3 print ‘readlines:‘,obj1.readlines()
5 #readline方法
6 print "read:",obj1.read()

1 readlines: [‘I heard the echo, from the valleys and the heart\n‘, ‘Open to the lonely soul of sickle harvesting\n‘, ‘Repeat outrightly, but also repeat the well-being of\n‘, ‘Eventually swaying in the desert oasis‘]

View Code

1 read: I heard the echo, from the valleys and the heart
2 Open to the lonely soul of sickle harvesting
3 Repeat outrightly, but also repeat the well-being of
4 Eventually swaying in the desert oasis

View Code

readlines和read方法虽然简便好用，但是如果这个文件很庞大，那么一次性读入内存就降低了程序的性能，这个时候我们就需要一行一行的读取文件来降低内存的使用率了。

readline,next,xreadlines:用来按行读取文件，其中需要仔细看xreadlines的用法，因为xreadlines返回的是一个迭代器，并不会直接返回某一行的内容

需要注意的是，尽管我把这一大坨代码放在一起展示，但是要是真的把这一大堆东西放在一起执行，就会报错（ValueError: Mixing iteration and read methods would lose data），具体的原因下面会进行解释

1 obj1 = open(‘E:\PythonL\\11-8\\filetest.txt‘,‘r‘)
 2 #readline方法
 3 print "readline:",obj1.readline()
 5 #readline方法
 6 print "next:",obj1.next()
 8 #readline方法
 9 r = obj1.xreadlines()
10 print ‘xreadlines:‘,r.next()
12 #readline方法
13 print ‘readlines:‘,obj1.readlines()
15 #readline方法
16 print "read:",obj1.read(）

View Code

左侧是代码，右侧是执行结果。