如何在 bs4 [python 3] 中的另一个标签内从没有类或 id 的标签中刮取 url

Posted

技术标签:

【中文标题】如何在 bs4 [python 3] 中的另一个标签内从没有类或 id 的标签中刮取 url【英文标题】:How can I scrape url from tag without class or id inside another tag in bs4 [ python 3 ] 【发布时间】:2022-01-11 02:55:57 【问题描述】:

我想从 (h2 class="" > a href="" : ) 获取所有网址

这段代码:

import requests
from bs4 import BeautifulSoup

header = "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:77.0) Gecko/20190101 Firefox/77.0"

Purl = 'https://www.tunisianet.com.tn/301-pc-portable-tunisie'

req = requests.get(Purl, headers=header)
soup = BeautifulSoup(req.content, 'lxml')

ProductUrl = []



#find title of product
showName = soup.select('h2','class':'h3 product-title')


#find link of product
for i in showName:
    ProductUrl.append(str(i.find('a')))

print(ProductUrl)
for i in ProductUrl:
    print(i[i.find("href"):])

我该如何解决?

例如:

【问题讨论】:

【参考方案1】:

这是所需的输出:

代码

import requests
from bs4 import BeautifulSoup

header = "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:77.0) Gecko/20190101 Firefox/77.0"

Purl = 'https://www.tunisianet.com.tn/301-pc-portable-tunisie'

req = requests.get(Purl, headers=header)
soup = BeautifulSoup(req.content, 'lxml')

ProductUrl = []



#find title of product
showName = soup.select('h2.h3.product-title a')


#find link of product
for i in showName:
    ProductUrl.append(i.get('href'))

#print(ProductUrl)
for i in ProductUrl:
    print(i)

输出

https://www.tunisianet.com.tn/pc-portable-tunisie/48873-pc-portable-vegabook-plus-14-quad-core-4-go-silver-50-dt-bon-d-achat.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51363-pc-portable-lenovo-v15-iil-i5-10e-gen-4-go-82C500TAFE.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52830-pc-portable-lenovo-ideapad-3-15igl05-dual-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/50111-pc-portable-asus-x543ma-gq1012t-dual-core-4-go-gris-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51234-pc-portable-asus-x543ma-dual-core-4-go-silver-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53434-pc-portable-lenovo-v15-igl-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53435-pc-portable-lenovo-ideapad-3-15igl05-dual-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47112-pc-portable-hp-15-dw1001nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47114-pc-portable-hp-15-dw1000nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47115-pc-portable-hp-15-dw1000nk-dual-core-8-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47113-pc-portable-hp-15-dw1001nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51643-pc-portable-hp-15-dw1000nk-dual-core-16-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51644-pc-portable-hp-15-dw1001nk-dual-core-16-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53033-pc-portable-asus-vivobook-e410ma-quad-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52815-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52905-pc-portable-asus-d415da-bv873t-amd-ryzen-3-3250u-4-go-windows-10-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52353-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-4-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52819-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-8-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51255-pc-portable-asus-m509da-amd-ryzen-3-4-go-silver-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52354-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-8-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52820-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-12-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52355-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-12-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52447-pc-portable-asus-vivobook-x509fa-i3-10e-gen-4-go-silver.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52906-pc-portable-asus-vivobook-x409fa-i3-10e-gen-4-go-silver.html

【讨论】:

以上是关于如何在 bs4 [python 3] 中的另一个标签内从没有类或 id 的标签中刮取 url的主要内容,如果未能解决你的问题,请参考以下文章

Python 3 - 调用方法来更改类中的另一个方法

从抓取bs4中过滤python中的数据

如何把bs4.element.navigablestring类型插入mysql数据库

在 Python 3.5 中导入 bs4

ImportError: No module named bs4错误解决方法

python - 如何将一个类方法中的变量值用于python中的另一个类[关闭]