不支持的格式或损坏的文件:预期的 BOF 记录;找到 b'\n\n\n\n\n\n<!'
Posted
技术标签:
【中文标题】不支持的格式或损坏的文件:预期的 BOF 记录;找到 b\'\\n\\n\\n\\n\\n\\n<!\'【英文标题】:Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n\n<!'不支持的格式或损坏的文件:预期的 BOF 记录;找到 b'\n\n\n\n\n\n<!' 【发布时间】:2021-11-30 07:16:46 【问题描述】:我做错了什么。我试图从我的 Github 解析 Excel 文件,但出现错误:Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n\n<!
。我在笔记本电脑上的 Spyder 和 Google Colab 中进行了这些操作,得到了同样令人遗憾的结果。我是 Github 的初学者,也许我的 .xlsx 做错了,而且读取不正确?
import pandas as pd
import requests as rq
import io
from io import BytesIO
put_k_ses = 'https://github.com/valeriigamaley/Kosh-Agach-SPS-MLDS/blob/024eb349c40174edbcd55e09e70f7fbc685c8ca6/GenInsolKoshAgachSPS.xlsx'
data1 = rq.get(put_k_ses).content
dannye_gener = pd.read_excel(io.BytesIO(data1))
print (dannye_gener)
错误如下所示:
XLRDError Traceback (most recent call last)
<ipython-input-8-85891c30428a> in <module>()
3 # путь к файлу с данными по инсоляции и генерации
4 data1 = rq.get(put_k_ses).content
----> 5 dannye_gener = pd.read_excel(io.BytesIO(data1))
6 print (dannye_gener)
9 frames
/usr/local/lib/python3.7/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
294 )
295 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 296 return func(*args, **kwargs)
297
298 return wrapper
/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols)
302
303 if not isinstance(io, ExcelFile):
--> 304 io = ExcelFile(io, engine=engine)
305 elif engine and engine != io.engine:
306 raise ValueError(
/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_base.py in __init__(self, path_or_buffer, engine)
865 self._io = stringify_path(path_or_buffer)
866
--> 867 self._reader = self._engines[engine](self._io)
868
869 def __fspath__(self):
/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_xlrd.py in __init__(self, filepath_or_buffer)
20 err_msg = "Install xlrd >= 1.0.0 for Excel support"
21 import_optional_dependency("xlrd", extra=err_msg)
---> 22 super().__init__(filepath_or_buffer)
23
24 @property
/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_base.py in __init__(self, filepath_or_buffer)
349 # N.B. xlrd.Book has a read attribute too
350 filepath_or_buffer.seek(0)
--> 351 self.book = self.load_workbook(filepath_or_buffer)
352 elif isinstance(filepath_or_buffer, str):
353 self.book = self.load_workbook(filepath_or_buffer)
/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_xlrd.py in load_workbook(self, filepath_or_buffer)
33 if hasattr(filepath_or_buffer, "read"):
34 data = filepath_or_buffer.read()
---> 35 return open_workbook(file_contents=data)
36 else:
37 return open_workbook(filepath_or_buffer)
/usr/local/lib/python3.7/dist-packages/xlrd/__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
160 formatting_info=formatting_info,
161 on_demand=on_demand,
--> 162 ragged_rows=ragged_rows,
163 )
164 return bk
/usr/local/lib/python3.7/dist-packages/xlrd/book.py in open_workbook_xls(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
89 t1 = time.clock()
90 bk.load_time_stage_1 = t1 - t0
---> 91 biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
92 if not biff_version:
93 raise XLRDError("Can't determine file's BIFF version")
/usr/local/lib/python3.7/dist-packages/xlrd/book.py in getbof(self, rqd_stream)
1269 bof_error('Expected BOF record; met end of file')
1270 if opcode not in bofcodes:
-> 1271 bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
1272 length = self.get2bytes()
1273 if length == MY_EOF:
/usr/local/lib/python3.7/dist-packages/xlrd/book.py in bof_error(msg)
1263 if DEBUG: print("reqd: 0x%04x" % rqd_stream, file=self.logfile)
1264 def bof_error(msg):
-> 1265 raise XLRDError('Unsupported format, or corrupt file: ' + msg)
1266 savpos = self._position
1267 opcode = self.get2bytes()
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n\n<!'
【问题讨论】:
您可能需要告诉read_excel
期待一个 XLSX 文件。如果不传递文件名,它可能会默认为 XLS。
【参考方案1】:
由于您要传入内存流,因此需要传入引擎:
如果io不是缓冲区或路径,则必须设置引擎来识别io。
(the docs)
dannye_gener = pd.read_excel(io.BytesIO(data1), engine="openpyxl")
由于您确实有一个字节缓冲区 (data1
),但是您可以这样做
dannye_gener = pd.read_excel(data1)
【讨论】:
熊猫也可以处理原始网址 当然可以。举例来说,让我们假设 OP 省略了一些 Pandas 无法处理的复杂requests
身份验证代码:)
说得好
当然,我尝试了一种简单的方法来读取原始 url 'raw=true',但它不起作用并给出错误“http not found”:c以上是关于不支持的格式或损坏的文件:预期的 BOF 记录;找到 b'\n\n\n\n\n\n<!'的主要内容,如果未能解决你的问题,请参考以下文章
不支持的格式或损坏的文件:预期的 BOF 记录;找到 b'\n\n\n\n\n\n<!'
手机上传照片显示:文件格式不对或格式与文件后缀名不付,是啥意思?啥是文件后缀名?