如何在 bs4 [python 3] 中的另一个标签内从没有类或 id 的标签中刮取 url
Posted
技术标签:
【中文标题】如何在 bs4 [python 3] 中的另一个标签内从没有类或 id 的标签中刮取 url【英文标题】:How can I scrape url from tag without class or id inside another tag in bs4 [ python 3 ] 【发布时间】:2022-01-11 02:55:57 【问题描述】:我想从 (h2 class="" > a href="" : ) 获取所有网址
这段代码:
import requests from bs4 import BeautifulSoup header = "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:77.0) Gecko/20190101 Firefox/77.0" Purl = 'https://www.tunisianet.com.tn/301-pc-portable-tunisie' req = requests.get(Purl, headers=header) soup = BeautifulSoup(req.content, 'lxml') ProductUrl = [] #find title of product showName = soup.select('h2','class':'h3 product-title') #find link of product for i in showName: ProductUrl.append(str(i.find('a'))) print(ProductUrl) for i in ProductUrl: print(i[i.find("href"):])
我该如何解决?
例如:
【问题讨论】:
【参考方案1】:这是所需的输出:
代码
import requests
from bs4 import BeautifulSoup
header = "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:77.0) Gecko/20190101 Firefox/77.0"
Purl = 'https://www.tunisianet.com.tn/301-pc-portable-tunisie'
req = requests.get(Purl, headers=header)
soup = BeautifulSoup(req.content, 'lxml')
ProductUrl = []
#find title of product
showName = soup.select('h2.h3.product-title a')
#find link of product
for i in showName:
ProductUrl.append(i.get('href'))
#print(ProductUrl)
for i in ProductUrl:
print(i)
输出
https://www.tunisianet.com.tn/pc-portable-tunisie/48873-pc-portable-vegabook-plus-14-quad-core-4-go-silver-50-dt-bon-d-achat.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51363-pc-portable-lenovo-v15-iil-i5-10e-gen-4-go-82C500TAFE.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52830-pc-portable-lenovo-ideapad-3-15igl05-dual-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/50111-pc-portable-asus-x543ma-gq1012t-dual-core-4-go-gris-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51234-pc-portable-asus-x543ma-dual-core-4-go-silver-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53434-pc-portable-lenovo-v15-igl-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53435-pc-portable-lenovo-ideapad-3-15igl05-dual-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47112-pc-portable-hp-15-dw1001nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47114-pc-portable-hp-15-dw1000nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47115-pc-portable-hp-15-dw1000nk-dual-core-8-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/47113-pc-portable-hp-15-dw1001nk-dual-core-4-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51643-pc-portable-hp-15-dw1000nk-dual-core-16-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51644-pc-portable-hp-15-dw1001nk-dual-core-16-go.html
https://www.tunisianet.com.tn/pc-portable-tunisie/53033-pc-portable-asus-vivobook-e410ma-quad-core-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52815-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-4-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52905-pc-portable-asus-d415da-bv873t-amd-ryzen-3-3250u-4-go-windows-10-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52353-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-4-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52819-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-8-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/51255-pc-portable-asus-m509da-amd-ryzen-3-4-go-silver-antivirus-bitdefender.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52354-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-8-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52820-pc-portable-lenovo-ideapad-3-15iil05-i3-10e-gen-12-go-noir.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52355-pc-portable-asus-vivobook-max-x543ua-i3-7e-gen-12-go-gris.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52447-pc-portable-asus-vivobook-x509fa-i3-10e-gen-4-go-silver.html
https://www.tunisianet.com.tn/pc-portable-tunisie/52906-pc-portable-asus-vivobook-x409fa-i3-10e-gen-4-go-silver.html
【讨论】:
以上是关于如何在 bs4 [python 3] 中的另一个标签内从没有类或 id 的标签中刮取 url的主要内容,如果未能解决你的问题,请参考以下文章
如何把bs4.element.navigablestring类型插入mysql数据库