Python web scraper，使用 BeautifulSoup 我的链接有问题，链接现在将成为标题故事，但重定向到档案页面

Posted 2023-03-22

技术标签:

【中文标题】Python web scraper，使用 BeautifulSoup 我的链接有问题，链接现在将成为标题故事，但重定向到档案页面【英文标题】：Python web scraper, with BeautifulSoup I am having problem with my link , the link is now going to headline story but redirecting to the archives page 【发布时间】：2021-10-07 20:07:10 【问题描述】：

该链接将我重定向到包含其他热门新闻https://www.coindesk.com/news/babel-finance-bets-on-longtime-fintech-hand-to-help-navigate-regulatory-landscape 的档案页面。 .com 和 babel 之间的链接上的标签 news 不应该存在，因为我认为它将新闻标题重定向到另一个页面。

from bs4 import BeautifulSoup
import requests


base_url ='https://www.coindesk.com/news'

source = requests.get(base_url).text

soup = BeautifulSoup(source, "html.parser")       
    
    
articles = soup.find_all(class_ = 'list-item-card post')
    
#print(len(articles))
#print(articles) 

    
for article in articles:
      
    headline = article.h4.text.strip()
    link = base_url + article.find_all("a")[1]["href"]
    text = article.find(class_="card-text").text.strip()
    img_url = base_url+article.picture.img['src']
            
    print(headline)
    print(link)
    print(text)
    print("Image " + img_url)
    ```

【问题讨论】：

【参考方案1】：

发生错误是因为您将基本链接（已包含/news/）连接到绝对网址

为了防止这种情况，您可以使用urllib.parse.urljoin()

在您的示例中，这应该可以解决问题：

from urllib.parse import urljoin

link = urljoin(base_url, article.find_all("a")[1]["href"])

【讨论】：

以上是关于Python web scraper，使用 BeautifulSoup 我的链接有问题，链接现在将成为标题故事，但重定向到档案页面的主要内容，如果未能解决你的问题，请参考以下文章