终端不会使用 BeautifulSoup 显示打印响应

Posted

技术标签:

【中文标题】终端不会使用 BeautifulSoup 显示打印响应【英文标题】:Terminal won't show print response using BeautifulSoup 【发布时间】:2019-05-10 15:57:39 【问题描述】:

这是我的代码:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://web.archive.org/web/20121007172955/https://www.nga.gov/collection/anZ1.htm')
soup = BeautifulSoup(page.text, 'html.parser')

name_list = soup.find(class_='BodyText')
name_list_item = name_list.find_all('a')

for i in name_list_item:
  names = name_list.contents[0]
  print(names)

然后我运行它,但终端中什么都没有显示,除了这样的空白:

请帮忙!! :

【问题讨论】:

【参考方案1】:

问题出在 for 循环中,您必须从 i 中提取内容,而不是从 name_list_item 中提取内容。

您的工作代码应如下所示:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://web.archive.org/web/20121007172955/https://www.nga.gov/collection/anZ1.htm')
soup = BeautifulSoup(page.text, 'html.parser')

name_list = soup.find(class_='BodyText')
name_list_item = name_list.find_all('a')

for i in name_list_item:
  names = i.contents[0]
  print(names)

【讨论】:

【参考方案2】:

我会建议您使用以下方法来获取链接。 (实际上您的方法的问题在于它还包含我们不想要的无效数据,您可以打印并检查)。有 32 个 <class 'bs4.element.NavigableString'> 类型的 names 没有内容,因此它正在打印 32 个 LF(ASCII 值 10)字符。

有用的链接 »

How to find tags with only certain attributes - BeautifulSoup

How to find children of nodes using Beautiful Soup

Python: BeautifulSoup extract text from anchor tag
>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> page = requests.get('https://web.archive.org/web/20121007172955/https://www
.nga.gov/collection/anZ1.htm')
>>>
>>> soup = BeautifulSoup(page.text, 'html.parser')
>>> name_list = soup.findAll("tr", "valign": "top")
>>>
>>> for name in name_list:
...     print(name.find("a")["href"])
...
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11630
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=34202
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3475
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=25135
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=2298
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=23988
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=8232
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=34154
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=4910
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3450
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=1986
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3451
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=20099
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3452
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=34309
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=27191
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=5846
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3941
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3941
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3453
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=35173
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11133
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3455
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3454
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=961
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11597
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11597
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11631
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3427
>>>

谢谢。

【讨论】:

以上是关于终端不会使用 BeautifulSoup 显示打印响应的主要内容,如果未能解决你的问题,请参考以下文章

我如何使用python将终端输出到网格中的tkinter框架?

Python beautifulsoup 中文乱码

div中的Beautifulsoup打印值[重复]

无法从 beautifulsoup 中正确打印出组合表

直接从 Python 在浏览器中启动 HTML 代码(由 BeautifulSoup 生成)

如何在文本文件中将输出复制到 bs4 中的终端