pandas 使用 read_csv 打开 txt 文件

Posted 2023-02-23

技术标签:

【中文标题】pandas 使用 read_csv 打开 txt 文件【英文标题】：open txt file using read_csv by pandas 【发布时间】：2017-11-27 16:58:07 【问题描述】：

我正在尝试使用 pandas 处理 txt 文件。但是，我在 read_csv 收到以下错误

CParserError Traceback（最近调用最后）在（） 22 Col.append(榆树) 23 ---> 24 修改=pd.read_csv(Path+file,skiprows=Header+1,header=None,delim_whitespace=True) 25 26 TimeSeries.append(修订)

C:\Users\obakatsu\Anaconda3\lib\site-packages\pandas\io\parsers.py 在 parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols，挤压，前缀，mangle_dupe_cols，dtype，引擎，转换器， true_values、false_values、skipinitialspace、skirows、skipfooter、 nrows，na_values，keep_default_na，na_filter，详细， skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser，dayfirst，迭代器，块大小，压缩，数千，十进制，换行符，quotechar，quoting，escapechar，comment，编码、方言、tupleize_cols、error_bad_lines、warn_bad_lines、 skip_footer、双引号、delim_whitespace、as_recarray、compact_ints、 use_unsigned, low_memory, buffer_lines, memory_map, float_precision) 第560章 561 --> 562 返回 _read(filepath_or_buffer, kwds) 563 564 parser_f.名称=名称

C:\Users\obakatsu\Anaconda3\lib\site-packages\pandas\io\parsers.py 在 _read(filepath_or_buffer, kwds) 323 返回解析器 324 --> 325 返回 parser.read() 326 第327章

C:\Users\obakatsu\Anaconda3\lib\site-packages\pandas\io\parsers.py 在阅读（自我，nrows） 813 raise ValueError（'skip_footer 不支持迭代'） 814 --> 815 ret = self._engine.read(nrows) 816 第817章

C:\Users\obakatsu\Anaconda3\lib\site-packages\pandas\io\parsers.py 在读取（自我，nrows）1312 def 读取（自我，nrows=None）：1313 尝试： -> 1314 data = self._reader.read(nrows) 1315 除了 StopIteration: 1316 if self._first_chunk:

pandas\parser.pyx 在 pandas.parser.TextReader.read (pandas\parser.c:8748)()

pandas\parser.pyx 在 pandas.parser.TextReader._read_low_memory (pandas\parser.c:9003)()

pandas\parser.pyx 在 pandas.parser.TextReader._read_rows (pandas\parser.c:9731)()

pandas\parser.pyx 在 pandas.parser.TextReader._tokenize_rows (pandas\parser.c:9602)()

pandas\parser.pyx 在 pandas.parser.raise_parser_error (pandas\parser.c:23325)()

CParserError：标记数据时出错。 C 错误：预期有 4 个字段第 6 行，锯 8

有谁知道我该如何解决这个问题？我要处理的 python 脚本和示例 txt 文件如下所示。

Path='data/NanFung/OCTA_Tower/test/'
files=os.listdir(Path)
TimeSeries=[]
Cols=[]
for file in files:
    new=open(Path+file)
    Supplement=[]
    Col=[]
    data=[]
    Header=0
    #calculate how many rows should be skipped
    for line in new:
        if line.startswith('Timestamp'):
            new1=line.split(" ")
            new1[-1]=str(file)[:-4]
            break
        else:
            Header += 1      

    #clean col name
    for elm in new1:
        if len(elm)>0:
            Col.append(elm)

    revised=pd.read_csv(Path+file,skiprows=Header+1,header=None,delim_whitespace=True)
    TimeSeries.append(revised) 
    Cols.append(Col)

txt 文件

history:/NIKL6215_ENC_1/CH$2d19$2d1$20$20CHW$20OUTLET$20TEMP
20-Oct-12 8:00 PM CT  to  ?

Timestamp                  Trend Flags  Status  Value (ºC)
-------------------------  -----------  ------  ----------
20-Oct-12 8:00:00 PM HKT   start      ok    15.310 ºC 
21-Oct-12 12:00:00 AM HKT             ok    15.130 ºC

【问题讨论】：

显示完整的回溯。你好，约翰。我已经编辑了包括完整追溯的问题 【参考方案1】：

它失败了，因为您正在阅读的文件部分如下所示：

Timestamp                  Trend Flags  Status  Value (ºC)
-------------------------  -----------  ------  ----------
20-Oct-12 8:00:00 PM HKT   start      ok    15.310 ºC 
21-Oct-12 12:00:00 AM HKT             ok    15.130 ºC

但是这里没有一致的分隔符。 read_csv 不明白如何阅读像您这样的固定宽度格式。您可以考虑使用分隔文件，例如在列之间使用制表符。

【讨论】：

谢谢约翰。最后，我可以使用 read_fwf 解决这个问题，因为这个数据结构没有逗号或制表符。 @KatsuyaObara：是的，read_fwf 是您现有输入格式的不错选择。【参考方案2】：

在前面包含这一行

file_name = Path+file #change below line to given

revised=pd.read_csv(路径+文件,skiprows=Header+1,header=None,delim_whitespace=True) 修改=pd.read_csv(file_name,skiprows=Header+1,header=None,sep="")

【讨论】：

以上是关于pandas 使用 read_csv 打开 txt 文件的主要内容，如果未能解决你的问题，请参考以下文章