pandas read_csv ‘utf-8‘ codec can‘t decode bytes in position 1198-1199: invalid continuation byte解决(

Posted Data+Science+Insight

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pandas read_csv ‘utf-8‘ codec can‘t decode bytes in position 1198-1199: invalid continuation byte解决(相关的知识,希望对你有一定的参考价值。

pandas read_csv 'utf-8' codec can't decode bytes in position 1198-1199: invalid continuation byte解决

目录

pandas read_csv 'utf-8' codec can't decode bytes in position 1198-1199: invalid continuation byte解决

问题

解决

完整错误日志


问题

# 使用pandas read_csv函数读取csv数据因为数据编码出现如下问题:

# df_dict = pd.read_csv('E:\\\\data.csv',encoding = "cp1252")
# df_dict = pd.read_csv(''E:\\\\data.csv',',encoding = "cp1252")

#encoding = "utf-8"
df_dict = pd.read_csv(''E:\\\\data.csv', encoding = 'utf-8')

解决

参考Stack Overflow

data = pd.read_csv("data.csv", encoding = 'unicode_escape', engine ='python')

数据中包含如下内容

完整错误日志

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-23-0bf69ab7bea1> in <module>
      7 # df_dict = pd.read_csv('E:\\\\data.csv',encoding = "cp1252")
      8 #encoding = "utf-8"
----> 9 df_dict = pd.read_csv('E:\\\\data.csv',encoding = "utf-8")

D:\\anaconda\\lib\\site-packages\\pandas\\io\\parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    608     kwds.update(kwds_defaults)
    609 
--> 610     return _read(filepath_or_buffer, kwds)
    611 
    612 

D:\\anaconda\\lib\\site-packages\\pandas\\io\\parsers.py in _read(filepath_or_buffer, kwds)
    460 
    461     # Create the parser.
--> 462     parser = TextFileReader(filepath_or_buffer, **kwds)
    463 
    464     if chunksize or iterator:

D:\\anaconda\\lib\\site-packages\\pandas\\io\\parsers.py in __init__(self, f, engine, **kwds)
    817             self.options["has_index_names"] = kwds["has_index_names"]
    818 
--> 819         self._engine = self._make_engine(self.engine)
    820 
    821     def close(self):

D:\\anaconda\\lib\\site-packages\\pandas\\io\\parsers.py in _make_engine(self, engine)
   1048             )
   1049         # error: Too many arguments for "ParserBase"
-> 1050         return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
   1051 
   1052     def _failover_to_python(self):

D:\\anaconda\\lib\\site-packages\\pandas\\io\\parsers.py in __init__(self, src, **kwds)
   1896 
   1897         try:
-> 1898             self._reader = parsers.TextReader(self.handles.handle, **kwds)
   1899         except Exception:
   1900             self.handles.close()

pandas\\_libs\\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas\\_libs\\parsers.pyx in pandas._libs.parsers.TextReader._get_header()

pandas\\_libs\\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas\\_libs\\parsers.pyx in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 1198-1199: invalid continuation byte

参考:Pandas: UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte

参考:Standard Encodings

参考:How to solve UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte in python

参考:https://docs.python.org/3/library/codecs.html#standard-encodings

以上是关于pandas read_csv ‘utf-8‘ codec can‘t decode bytes in position 1198-1199: invalid continuation byte解决(的主要内容,如果未能解决你的问题,请参考以下文章

pandas read_csv ‘utf-8‘ codec can‘t decode bytes in position 1198-1199: invalid continuation byte解决(

pandas用read_csv时编码问题解决

Python Pandas 中的引擎 read_csv

Pandas - 去除空白

python中的panda库1

pandas.read_csv 报ssl.SSLError