不支持的格式或损坏的文件：预期的 BOF 记录；找到 b'\n\n\n\n\n\n<!'

Posted 2023-03-12

技术标签:

【中文标题】不支持的格式或损坏的文件：预期的 BOF 记录；找到 b\'\\n\\n\\n\\n\\n\\n<!\'【英文标题】：Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n\n<!'不支持的格式或损坏的文件：预期的 BOF 记录；找到 b'\n\n\n\n\n\n<!' 【发布时间】：2021-11-30 07:16:46 【问题描述】：

我做错了什么。我试图从我的 Github 解析 Excel 文件，但出现错误：Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n\n<!。我在笔记本电脑上的 Spyder 和 Google Colab 中进行了这些操作，得到了同样令人遗憾的结果。我是 Github 的初学者，也许我的 .xlsx 做错了，而且读取不正确？

import pandas as pd
import requests as rq
import io
from io import BytesIO

put_k_ses = 'https://github.com/valeriigamaley/Kosh-Agach-SPS-MLDS/blob/024eb349c40174edbcd55e09e70f7fbc685c8ca6/GenInsolKoshAgachSPS.xlsx'
data1 = rq.get(put_k_ses).content
dannye_gener = pd.read_excel(io.BytesIO(data1)) 
print (dannye_gener)

错误如下所示：

XLRDError                                 Traceback (most recent call last)
<ipython-input-8-85891c30428a> in <module>()
      3 # путь к файлу с данными по инсоляции и генерации
      4 data1 = rq.get(put_k_ses).content
----> 5 dannye_gener = pd.read_excel(io.BytesIO(data1))
      6 print (dannye_gener)

9 frames
/usr/local/lib/python3.7/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    294                 )
    295                 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 296             return func(*args, **kwargs)
    297 
    298         return wrapper

/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols)
    302 
    303     if not isinstance(io, ExcelFile):
--> 304         io = ExcelFile(io, engine=engine)
    305     elif engine and engine != io.engine:
    306         raise ValueError(

/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_base.py in __init__(self, path_or_buffer, engine)
    865         self._io = stringify_path(path_or_buffer)
    866 
--> 867         self._reader = self._engines[engine](self._io)
    868 
    869     def __fspath__(self):

/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_xlrd.py in __init__(self, filepath_or_buffer)
     20         err_msg = "Install xlrd >= 1.0.0 for Excel support"
     21         import_optional_dependency("xlrd", extra=err_msg)
---> 22         super().__init__(filepath_or_buffer)
     23 
     24     @property

/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_base.py in __init__(self, filepath_or_buffer)
    349             # N.B. xlrd.Book has a read attribute too
    350             filepath_or_buffer.seek(0)
--> 351             self.book = self.load_workbook(filepath_or_buffer)
    352         elif isinstance(filepath_or_buffer, str):
    353             self.book = self.load_workbook(filepath_or_buffer)

/usr/local/lib/python3.7/dist-packages/pandas/io/excel/_xlrd.py in load_workbook(self, filepath_or_buffer)
     33         if hasattr(filepath_or_buffer, "read"):
     34             data = filepath_or_buffer.read()
---> 35             return open_workbook(file_contents=data)
     36         else:
     37             return open_workbook(filepath_or_buffer)

/usr/local/lib/python3.7/dist-packages/xlrd/__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
    160         formatting_info=formatting_info,
    161         on_demand=on_demand,
--> 162         ragged_rows=ragged_rows,
    163         )
    164     return bk

/usr/local/lib/python3.7/dist-packages/xlrd/book.py in open_workbook_xls(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
     89         t1 = time.clock()
     90         bk.load_time_stage_1 = t1 - t0
---> 91         biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
     92         if not biff_version:
     93             raise XLRDError("Can't determine file's BIFF version")

/usr/local/lib/python3.7/dist-packages/xlrd/book.py in getbof(self, rqd_stream)
   1269             bof_error('Expected BOF record; met end of file')
   1270         if opcode not in bofcodes:
-> 1271             bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
   1272         length = self.get2bytes()
   1273         if length == MY_EOF:

/usr/local/lib/python3.7/dist-packages/xlrd/book.py in bof_error(msg)
   1263         if DEBUG: print("reqd: 0x%04x" % rqd_stream, file=self.logfile)
   1264         def bof_error(msg):
-> 1265             raise XLRDError('Unsupported format, or corrupt file: ' + msg)
   1266         savpos = self._position
   1267         opcode = self.get2bytes()

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n\n<!'

【问题讨论】：

您可能需要告诉read_excel 期待一个 XLSX 文件。如果不传递文件名，它可能会默认为 XLS。 【参考方案1】：

由于您要传入内存流，因此需要传入引擎：

如果io不是缓冲区或路径，则必须设置引擎来识别io。

(the docs)

dannye_gener = pd.read_excel(io.BytesIO(data1), engine="openpyxl")

由于您确实有一个字节缓冲区 (data1)，但是您可以这样做

dannye_gener = pd.read_excel(data1)

【讨论】：

熊猫也可以处理原始网址当然可以。举例来说，让我们假设 OP 省略了一些 Pandas 无法处理的复杂 requests 身份验证代码:) 说得好当然，我尝试了一种简单的方法来读取原始 url 'raw=true'，但它不起作用并给出错误“http not found”：c

以上是关于不支持的格式或损坏的文件：预期的 BOF 记录；找到 b'\n\n\n\n\n\n<!'的主要内容，如果未能解决你的问题，请参考以下文章

不支持的格式或损坏的文件：预期的 BOF 记录；找到 b'\n\n\n\n\n\n<!'

au为啥加载音频错误

手机上传照片显示：文件格式不对或格式与文件后缀名不付，是啥意思？啥是文件后缀名？

运行时错误 3021 - EOF 或 BOF 为真或当前记录已被删除

怎么检测PDF文件是不是损坏

Excel无法打开文件xxx.xlsx，因为文件格式或文件扩展名无效。请确定文件未损坏解决办法