从表中抓取数据时，'int'对象没有属性'find_all'

Posted 2021-04-05

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了从表中抓取数据时，'int'对象没有属性'find_all'相关的知识，希望对你有一定的参考价值。

我得到一个AttributeError: 'int' object has no attribute 'find_all'异常，即使table的值不为空：

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
data = []
Url = 'http://www.svcengg.com/course_offered.php'
pagehtml = uReq(Url)
soup = soup(pageHtml,"html.parser")
table = soup.find("table", { "width" : "470","height":"212"})
#print(table) 
for x in table:
    table_body = x.find('tbody')
    rows = table_body.find_all('tr')
    for tr in rows:
        cols = tr.find_all('td')
        for td in cols:
            data.append(td.text.strip())
    print(data)

答案

您正在迭代单个table元素：

for x in table:

对元素的迭代将包括文本节点以及其他元素。对于给定的URL，table中的第一个元素是一个字符串：

>>> list(table)[0]
'
'

在字符串上调用find()会产生一个整数（因为你调用的是str.find() method，而不是BeautifulSoup Element.find()方法）。

所以table_body = x.find('tbody')将-1分配给table_body，因为在字符串tbody中没有这样的字符串' '。整数没有find_all()方法。

不要在单个元素上使用迭代。你已经找到了这个表，没有这样的表，或者1：

if table is not None:
    table_body = table.find('tbody')

但请注意，HTML输入中没有<tbody>元素。浏览器插入一个标准的<tbody>元素，如果它缺少，但BeautifulSoup没有。即使HTML中有<tbody>元素，您仍然可以直接从表元素中查找<tr>表行。跳过寻找tbody，没有必要。

这有效：

if table is not None:
    rows = table.find_all('tr')
    for tr in rows:
        cols = tr.find_all('td')
        for td in cols:
            data.append(td.text.strip())

对于给定的URL，data则包含：

>>> from pprint import pprint
>>> pprint(data)
['Electronics  & Communication Engineering',
 '120',
 'Computer Science & Engineering',
 '120',
 'Information Science & Engineering',
 '60',
 'Mechanical Engineering',
 '120',
 'Electrical & Electronics Engineering',
 '60',
 'Civil Engineering',
 '120']

以上是关于从表中抓取数据时，'int'对象没有属性'find_all'的主要内容，如果未能解决你的问题，请参考以下文章