Beautifulsoup find_all 返回一个空列表
Posted
技术标签:
【中文标题】Beautifulsoup find_all 返回一个空列表【英文标题】:Beautifulsoup find_all return an empty list 【发布时间】:2021-10-01 05:23:41 【问题描述】:我正在尝试抓取一个表格,但使用 find_all 时返回的都是空列表。
这里是网站的链接:link
from bs4 import BeautifulSoup
import requests
html_text = requests.get('some url').text
soup = BeautifulSoup(html_text, 'lxml')
table = soup.find('table', class_ = 'tinytable')
rows = table.find_all('tr')
for row in rows:
columns = row.find_all('td')
print(columns) # Prints out empty lists
如果我设置打印行,我会得到:
<td align="right"></td>
<td align="right"><div><a href="http://www.sec.gov/Archives/edgar/data/1841804/000089924321029825/xslF345X03/doc4.xml" target="_blank" title="SEC Form 4">2021-07-23 21:48:35</a></div></td>
<td align="right"><div>2021-07-21</div></td>
<td><b> <a href="/INST" onmouseout="UnTip()" onmouseover="Tip('<img src=\'https://www.profitspi.com/stock/stock-charts.ashx?chart=INST&v=stock-chart&vs=637453390322078326\' alt=\'\' width=\'360px\' height=\'280px\'>', DELAY, 1)">INST</a></b></td>
<td><a href="/INST">Instructure Holdings, Inc.</a></td>
<td><a href="/insider/Bowen-Dale-E./1862625" title="476,765 direct shares
C/O Instructure Holdings, Inc.
6330 South East, Suite 700
Salt Lake City, UT 84121">Bowen Dale E.</a></td>
<td>CFO</td>
<td>P - Purchase</td>
<td align="right">$20.00</td>
<td align="right">+26,250</td>
<td align="right">476,765</td>
<td align="right">+6%</td>
<td align="right">+$525,000</td>
<td align="right"></td>
<td align="right"></td>
<td align="right"></td>
<td align="right"></td>
在哪里我可以看到使用 find_all 时应该返回“td”标签
【问题讨论】:
如果可以的话,能不能把网址分享一下,这样很容易发现问题! 当然。这是:link @hampani 检查我的以下答案。 【参考方案1】:import pandas as pd
df = pd.read_html('http://openinsider.com/screener?s=&o=&pl=&ph=&ll=&lh=&fd=730&fdr=&td=0&tdr=&fdlyl=&fdlyh=&daysago=&xp=1&xs=1&vl=&vh=&ocl=&och=&sic1=-1&sicl=100&sich=9999&grp=0&nfl=&nfh=&nil=&nih=&nol=&noh=&v2l=&v2h=&oc2l=&oc2h=&sortcol=0&cnt=100&page=1',
attrs='class': 'tinytable')[0]
print(df)
df.to_csv('data.csv', index=False, encoding='utf-8-sig')
输出:
X Filing Date Trade Date Ticker ... 1d 1w 1m 6m
0 NaN 2021-07-23 21:48:35 2021-07-21 INST ... NaN NaN NaN NaN
1 NaN 2021-07-23 21:48:13 2021-07-21 INST ... NaN NaN NaN NaN
2 NaN 2021-07-23 21:46:08 2021-07-23 ROCCU ... NaN NaN NaN NaN
3 NaN 2021-07-23 21:45:35 2021-07-21 INST ... NaN NaN NaN NaN
4 NaN 2021-07-23 21:25:19 2021-07-23 DKNG ... NaN NaN NaN NaN
.. ... ... ... ... ... .. .. .. ..
95 DM 2021-07-23 16:32:14 2021-07-21 CMG ... NaN NaN NaN NaN
96 D 2021-07-23 16:30:57 2021-07-22 HRMY ... NaN NaN NaN NaN
97 NaN 2021-07-23 16:30:44 2021-07-21 ABNB ... NaN NaN NaN NaN
98 D 2021-07-23 16:30:39 2021-07-21 TWST ... NaN NaN NaN NaN
99 D 2021-07-23 16:30:31 2021-07-21 TWST ... NaN NaN NaN NaN
[100 rows x 17 columns]
【讨论】:
以上是关于Beautifulsoup find_all 返回一个空列表的主要内容,如果未能解决你的问题,请参考以下文章
find_all的用法 Python(bs4,BeautifulSoup)