如何使用BS4获取html正文的特定部分
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何使用BS4获取html正文的特定部分相关的知识,希望对你有一定的参考价值。
我为此的解决方案是使用soup.text抓取数据,然后使用一些正则表达式以非常手动的方式对其进行清理,然后拆分。但是,我相信有一些使用BS4命令的简便方法。
所需的输出是公司的显示名称,基本价格,折扣价格和小卖部。
URL:
https://forsikringsguiden.dk/signalr/poll?transport=longPolling&messageId=d-D7589F50-A%2C0%7CtK%2C1%7CtL%2C1%7CtM%2C0&clientProtocol=1.4&connectionToken=UEBdftp4dljKH%2Fw56kprFeB7pnlcXiEv6OqR7mKdzYGoT48tuUFIahljyCEjdyaOn%2BqbxkERLSzuO3QA%2Bwh4BrWIKWlE4rzLhzJnDPedGyOo0Yar2KU7QLCWphtOeava&connectionData=%5B%7B%22name%22%3A%22insuranceofferrequesthub%22%7D%5D&tid=4&_=1572591981773
我的代码
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://forsikringsguiden.dk/signalr/poll?transport=longPolling&messageId=d-D7589F50-A%2C0%7CtK%2C1%7CtL%2C1%7CtM%2C0&clientProtocol=1.4&connectionToken=UEBdftp4dljKH%2Fw56kprFeB7pnlcXiEv6OqR7mKdzYGoT48tuUFIahljyCEjdyaOn%2BqbxkERLSzuO3QA%2Bwh4BrWIKWlE4rzLhzJnDPedGyOo0Yar2KU7QLCWphtOeava&connectionData=%5B%7B%22name%22%3A%22insuranceofferrequesthub%22%7D%5D&tid=4&_=1572591981773"
header = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"
def get_total_items(url):
soup = BeautifulSoup(requests.get(url, headers="User-Agent":header).text, 'lxml')
return soup.text
print(get_total_items(url))
输出:
"C":"d-D7589F50-A,0|tK,12|tL,1|tM,0","M":["H":"InsuranceOfferRequestHub","M":"ReceiveNoMatch","A":["companyId":41,"companydisplayname":"Sønderjysk Forsikring","message":"Kan ikke matche dine behov"],"H":"InsuranceOfferRequestHub","M":"ReceiveNoMatch","A":["companyId":33,"companydisplayname":"NEM Forsikring A/S","message":"Kan ikke matche dine behov"],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":17,"companydisplayname":"If Skadeforsikring","produktId":236,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":88.03631578947369,"stars":9,"basicprice":4938,"discountedprice":4670,"selvrisiko":5000,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":false,"name":"Fastpris"]],"basicprice":4938,"discountedprice":4670],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":15,"companydisplayname":"Topdanmark","produktId":190,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.119210526315811,"stars":10,"basicprice":8360,"discountedprice":7003,"selvrisiko":3927,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":true,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":8360,"discountedprice":7003],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":18,"companydisplayname":"Alm. Brand","produktId":228,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":93.036973684210551,"stars":10,"basicprice":4252,"discountedprice":3633,"selvrisiko":6324,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":false,"name":"Fastpris"]],"basicprice":4252,"discountedprice":3633],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":43,"companydisplayname":"OK Forsikring","produktId":345,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.617894736842118,"stars":10,"basicprice":6473,"discountedprice":6473,"selvrisiko":4982,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":6473,"discountedprice":6473],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":20,"companydisplayname":"GF Forsikring","produktId":279,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.617894736842118,"stars":10,"basicprice":6737,"discountedprice":6737,"selvrisiko":4982,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":6737,"discountedprice":6737],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":10,"companydisplayname":"Nykredit Forsikring A/S","produktId":212,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":94.88828947368421,"stars":10,"basicprice":5215,"discountedprice":4707,"selvrisiko":7100,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":5215,"discountedprice":4707],"H":"InsuranceOfferRequestHub","M":"ReceiveNoMatch","A":["companyId":32,"companydisplayname":"NEXT Forsikring A/S","message":"Kan ikke matche dine behov"],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":19,"companydisplayname":"Gjensidige Forsikring A/S","produktId":123,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":94.88828947368421,"stars":10,"basicprice":5215,"discountedprice":4707,"selvrisiko":7100,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":5215,"discountedprice":4707],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":13,"companydisplayname":"Runa Forsikring","produktId":155,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":92.388289473684239,"stars":10,"basicprice":999999,"discountedprice":3877,"selvrisiko":6110,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":true,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":999999,"discountedprice":3877],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":34,"companydisplayname":"PenSam","produktId":308,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.266184210526319,"stars":10,"basicprice":4691,"discountedprice":4691,"selvrisiko":5493,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":true,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":false,"name":"Fastpris"]],"basicprice":4691,"discountedprice":4691],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":11,"companydisplayname":"Lærerstandens Brandforsikring","produktId":153,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":92.388289473684239,"stars":10,"basicprice":999999,"discountedprice":3877,"selvrisiko":6110,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":true,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":999999,"discountedprice":3877],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":12,"companydisplayname":"Bauta Forsikring","produktId":154,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":92.388289473684239,"stars":10,"basicprice":999999,"discountedprice":3877,"selvrisiko":6110,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":true,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":999999,"discountedprice":3877],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":25,"companydisplayname":"Alka Forsikring","produktId":130,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":90.658684210526332,"stars":10,"basicprice":6151,"discountedprice":6151,"selvrisiko":6000,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":6151,"discountedprice":6151],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":24,"companydisplayname":"Tryg","produktId":252,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":95.687631578947389,"stars":10,"basicprice":7324,"discountedprice":5884,"selvrisiko":5833,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":7324,"discountedprice":5884],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":44,"companydisplayname":"FDM Forsikring","produktId":365,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":95.687631578947389,"stars":10,"basicprice":4227,"discountedprice":4227,"selvrisiko":5833,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":4227,"discountedprice":4227]]
更新:也尝试了以下功能:
def get_total_items(url):
soup = BeautifulSoup(requests.get(url, headers="User-Agent":header).text, 'lxml')
blacklist= ["companyId", "basicprice", "discountprice", "selvrisiko"]
text_ele = [t.text for t in soup if t.name in blacklist]
return text_ele
print(get_total_items(url))
没有用。
答案
将结果存储为JSON对象格式并进行解析,您将其存储为字符串格式。
result = json.loads(get_total_items(url))
如何访问JSON对象中的元素(这是JSON对象解析的示例,您需要根据JSON对象添加循环条件)
result['M'][0]['A'][0]['companydisplayname']
我建议您在python中使用requests和json模块。
以上是关于如何使用BS4获取html正文的特定部分的主要内容,如果未能解决你的问题,请参考以下文章