链接属性未在 BeautifulSoup 对象中打印

Posted 2023-02-23

技术标签:

【中文标题】链接属性未在 BeautifulSoup 对象中打印【英文标题】：Link attribute not getting printed in BeautifulSoup object 【发布时间】：2019-09-24 05:42:55 【问题描述】：

我正在编写一个程序，该程序将从谷歌新闻中获取头条新闻。它应该打印文章的标题和链接。但是，它不会打印链接。

import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen

news_url="https://news.google.com/news/rss"
Client=urlopen(news_url)
xml_page=Client.read()
Client.close()

soup_page=soup(xml_page,"lxml")
news_list=soup_page.findAll("item")
# Print news title, url and publish date
for news in news_list:
  print(news.title.text)
  print(news.link.text)  
  print("-"*10)

这是一个输出线的例子

Following Falcon 9 Saturday launch, CRS-17 Dragon arrives at the ISS

----------

它应该打印标题和链接。但它只是打印标题

【问题讨论】：

【参考方案1】：

这个 html 的结构很奇怪，但是如果你把代码中的 for 循环改成这样：

for news in news_list:
   link = news.select_one('title')    
   print(link.text)
   print(link.next_sibling.next_sibling)
   print("-"*10)

你应该得到带有链接的标题。

【讨论】：

好吧，这看起来很有希望，而不是打印一个链接，而是打印这个<guid isPermaLink="false">52780287444614</guid>我不知道我是否应该把链接排除在外。 @MADScienceBdukester10 - 我不知道你为什么得到这个；我只是在不同的计算机和IDE中尝试过，第一个结果是

WH instructs former counsel not to comply with congressional subpoena - ABC News https://abcnews.go.com/Politics/white-house-instruct-counsel-comply-congressional-subpoena/story?id=62873987

奇怪，会不会是我电脑上没有的模块？想通了。出于某种原因，我有第二个版本的文件，它在 lxml 上使用 xml 而不是。当我使用 lxml 再次运行它时，它起作用了！非常感谢！ @MADScienceBdukester10 很高兴最后成功了！【参考方案2】：

您应该在代码中修改这一行：

soup_page=soup(xml_page,"lxml")

进入：

soup_page=soup(xml_page,"xml")

你会得到结果。

【讨论】：

你说的xml由于某种原因不能在我的电脑上运行是什么意思？

以上是关于链接属性未在 BeautifulSoup 对象中打印的主要内容，如果未能解决你的问题，请参考以下文章