Beautifulsoup find_all 返回一个空列表

Posted

技术标签:

【中文标题】Beautifulsoup find_all 返回一个空列表【英文标题】:Beautifulsoup find_all return an empty list 【发布时间】:2021-10-01 05:23:41 【问题描述】:

我正在尝试抓取一个表格,但使用 find_all 时返回的都是空列表。

这里是网站的链接:link

from bs4 import BeautifulSoup
import requests

html_text = requests.get('some url').text
soup = BeautifulSoup(html_text, 'lxml')

table = soup.find('table', class_ = 'tinytable')
rows = table.find_all('tr')
for row in rows:
    columns = row.find_all('td')
    print(columns) # Prints out empty lists

如果我设置打印行,我会得到:

<td align="right"></td>
<td align="right"><div><a href="http://www.sec.gov/Archives/edgar/data/1841804/000089924321029825/xslF345X03/doc4.xml" target="_blank" title="SEC Form 4">2021-07-23 21:48:35</a></div></td>
<td align="right"><div>2021-07-21</div></td>
<td><b> <a href="/INST" onmouseout="UnTip()" onmouseover="Tip('&lt;img src=\'https://www.profitspi.com/stock/stock-charts.ashx?chart=INST&amp;v=stock-chart&amp;vs=637453390322078326\' alt=\'\' width=\'360px\' height=\'280px\'&gt;', DELAY, 1)">INST</a></b></td> 
<td><a href="/INST">Instructure Holdings, Inc.</a></td>
<td><a href="/insider/Bowen-Dale-E./1862625" title="476,765 direct shares
C/O Instructure Holdings, Inc.
6330 South East, Suite 700
Salt Lake City, UT 84121">Bowen Dale E.</a></td>
<td>CFO</td>
<td>P - Purchase</td>
<td align="right">$20.00</td>
<td align="right">+26,250</td>
<td align="right">476,765</td>
<td align="right">+6%</td>
<td align="right">+$525,000</td>
<td align="right"></td>
<td align="right"></td>
<td align="right"></td>
<td align="right"></td>

在哪里我可以看到使用 find_all 时应该返回“td”标签

【问题讨论】:

如果可以的话,能不能把网址分享一下,这样很容易发现问题! 当然。这是:link @hampani 检查我的以下答案。 【参考方案1】:
import pandas as pd


df = pd.read_html('http://openinsider.com/screener?s=&o=&pl=&ph=&ll=&lh=&fd=730&fdr=&td=0&tdr=&fdlyl=&fdlyh=&daysago=&xp=1&xs=1&vl=&vh=&ocl=&och=&sic1=-1&sicl=100&sich=9999&grp=0&nfl=&nfh=&nil=&nih=&nol=&noh=&v2l=&v2h=&oc2l=&oc2h=&sortcol=0&cnt=100&page=1',
                  attrs='class': 'tinytable')[0]
print(df)
df.to_csv('data.csv', index=False, encoding='utf-8-sig')

输出:

      X          Filing Date  Trade Date Ticker  ...  1d  1w  1m  6m
0   NaN  2021-07-23 21:48:35  2021-07-21   INST  ... NaN NaN NaN NaN
1   NaN  2021-07-23 21:48:13  2021-07-21   INST  ... NaN NaN NaN NaN
2   NaN  2021-07-23 21:46:08  2021-07-23  ROCCU  ... NaN NaN NaN NaN
3   NaN  2021-07-23 21:45:35  2021-07-21   INST  ... NaN NaN NaN NaN
4   NaN  2021-07-23 21:25:19  2021-07-23   DKNG  ... NaN NaN NaN NaN
..  ...                  ...         ...    ...  ...  ..  ..  ..  ..
95   DM  2021-07-23 16:32:14  2021-07-21    CMG  ... NaN NaN NaN NaN
96    D  2021-07-23 16:30:57  2021-07-22   HRMY  ... NaN NaN NaN NaN
97  NaN  2021-07-23 16:30:44  2021-07-21   ABNB  ... NaN NaN NaN NaN
98    D  2021-07-23 16:30:39  2021-07-21   TWST  ... NaN NaN NaN NaN
99    D  2021-07-23 16:30:31  2021-07-21   TWST  ... NaN NaN NaN NaN

[100 rows x 17 columns]

【讨论】:

以上是关于Beautifulsoup find_all 返回一个空列表的主要内容,如果未能解决你的问题,请参考以下文章

find_all的用法 Python(bs4,BeautifulSoup)

Beautiful Soup 中 find_all 方法的返回类型是啥?

读BeautifulSoup官方文档之html树的搜索

03_BeautifulSoup的使用2-搜索文档树

爬虫:BeautifulSoup--select

BeautifulSoup 中“findAll”和“find_all”的区别