如何使用BS4获取html正文的特定部分

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何使用BS4获取html正文的特定部分相关的知识,希望对你有一定的参考价值。

我为此的解决方案是使用soup.text抓取数据,然后使用一些正则表达式以非常手动的方式对其进行清理,然后拆分。但是,我相信有一些使用BS4命令的简便方法。

所需的输出是公司的显示名称,基本价格,折扣价格和小卖部。

URL:

https://forsikringsguiden.dk/signalr/poll?transport=longPolling&messageId=d-D7589F50-A%2C0%7CtK%2C1%7CtL%2C1%7CtM%2C0&clientProtocol=1.4&connectionToken=UEBdftp4dljKH%2Fw56kprFeB7pnlcXiEv6OqR7mKdzYGoT48tuUFIahljyCEjdyaOn%2BqbxkERLSzuO3QA%2Bwh4BrWIKWlE4rzLhzJnDPedGyOo0Yar2KU7QLCWphtOeava&connectionData=%5B%7B%22name%22%3A%22insuranceofferrequesthub%22%7D%5D&tid=4&_=1572591981773

我的代码

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://forsikringsguiden.dk/signalr/poll?transport=longPolling&messageId=d-D7589F50-A%2C0%7CtK%2C1%7CtL%2C1%7CtM%2C0&clientProtocol=1.4&connectionToken=UEBdftp4dljKH%2Fw56kprFeB7pnlcXiEv6OqR7mKdzYGoT48tuUFIahljyCEjdyaOn%2BqbxkERLSzuO3QA%2Bwh4BrWIKWlE4rzLhzJnDPedGyOo0Yar2KU7QLCWphtOeava&connectionData=%5B%7B%22name%22%3A%22insuranceofferrequesthub%22%7D%5D&tid=4&_=1572591981773" 
header = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"

def get_total_items(url):
    soup = BeautifulSoup(requests.get(url, headers="User-Agent":header).text, 'lxml')
    return soup.text

print(get_total_items(url))

输出:

"C":"d-D7589F50-A,0|tK,12|tL,1|tM,0","M":["H":"InsuranceOfferRequestHub","M":"ReceiveNoMatch","A":["companyId":41,"companydisplayname":"Sønderjysk Forsikring","message":"Kan ikke matche dine behov"],"H":"InsuranceOfferRequestHub","M":"ReceiveNoMatch","A":["companyId":33,"companydisplayname":"NEM Forsikring A/S","message":"Kan ikke matche dine behov"],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":17,"companydisplayname":"If Skadeforsikring","produktId":236,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":88.03631578947369,"stars":9,"basicprice":4938,"discountedprice":4670,"selvrisiko":5000,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":false,"name":"Fastpris"]],"basicprice":4938,"discountedprice":4670],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":15,"companydisplayname":"Topdanmark","produktId":190,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.119210526315811,"stars":10,"basicprice":8360,"discountedprice":7003,"selvrisiko":3927,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":true,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":8360,"discountedprice":7003],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":18,"companydisplayname":"Alm. Brand","produktId":228,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":93.036973684210551,"stars":10,"basicprice":4252,"discountedprice":3633,"selvrisiko":6324,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":false,"name":"Fastpris"]],"basicprice":4252,"discountedprice":3633],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":43,"companydisplayname":"OK Forsikring","produktId":345,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.617894736842118,"stars":10,"basicprice":6473,"discountedprice":6473,"selvrisiko":4982,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":6473,"discountedprice":6473],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":20,"companydisplayname":"GF Forsikring","produktId":279,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.617894736842118,"stars":10,"basicprice":6737,"discountedprice":6737,"selvrisiko":4982,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":6737,"discountedprice":6737],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":10,"companydisplayname":"Nykredit Forsikring A/S","produktId":212,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":94.88828947368421,"stars":10,"basicprice":5215,"discountedprice":4707,"selvrisiko":7100,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":5215,"discountedprice":4707],"H":"InsuranceOfferRequestHub","M":"ReceiveNoMatch","A":["companyId":32,"companydisplayname":"NEXT Forsikring A/S","message":"Kan ikke matche dine behov"],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":19,"companydisplayname":"Gjensidige Forsikring A/S","produktId":123,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":94.88828947368421,"stars":10,"basicprice":5215,"discountedprice":4707,"selvrisiko":7100,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":5215,"discountedprice":4707],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":13,"companydisplayname":"Runa Forsikring","produktId":155,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":92.388289473684239,"stars":10,"basicprice":999999,"discountedprice":3877,"selvrisiko":6110,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":true,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":999999,"discountedprice":3877],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":34,"companydisplayname":"PenSam","produktId":308,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.266184210526319,"stars":10,"basicprice":4691,"discountedprice":4691,"selvrisiko":5493,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":true,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":false,"name":"Fastpris"]],"basicprice":4691,"discountedprice":4691],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":11,"companydisplayname":"Lærerstandens Brandforsikring","produktId":153,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":92.388289473684239,"stars":10,"basicprice":999999,"discountedprice":3877,"selvrisiko":6110,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":true,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":999999,"discountedprice":3877],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":12,"companydisplayname":"Bauta Forsikring","produktId":154,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":92.388289473684239,"stars":10,"basicprice":999999,"discountedprice":3877,"selvrisiko":6110,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":true,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":999999,"discountedprice":3877],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":25,"companydisplayname":"Alka Forsikring","produktId":130,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":90.658684210526332,"stars":10,"basicprice":6151,"discountedprice":6151,"selvrisiko":6000,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":6151,"discountedprice":6151],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":24,"companydisplayname":"Tryg","produktId":252,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":95.687631578947389,"stars":10,"basicprice":7324,"discountedprice":5884,"selvrisiko":5833,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":7324,"discountedprice":5884],"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":["offers":["companyId":44,"companydisplayname":"FDM Forsikring","produktId":365,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":95.687631578947389,"stars":10,"basicprice":4227,"discountedprice":4227,"selvrisiko":5833,"addtionalOptions":["sequence":1,"chosen":true,"name":"Kasko","sequence":2,"chosen":false,"name":"Friskade","sequence":3,"chosen":false,"name":"Udvidet glasdækning","sequence":4,"chosen":false,"name":"Førerdækning","sequence":5,"chosen":false,"name":"Vejhjælp","sequence":6,"chosen":true,"name":"Fastpris"]],"basicprice":4227,"discountedprice":4227]]

更新:也尝试了以下功能:

def get_total_items(url):
    soup = BeautifulSoup(requests.get(url, headers="User-Agent":header).text, 'lxml')
    blacklist= ["companyId", "basicprice", "discountprice", "selvrisiko"]
    text_ele = [t.text for t in soup if t.name in blacklist]
    return text_ele  

print(get_total_items(url))

没有用。

答案

将结果存储为JSON对象格式并进行解析,您将其存储为字符串格式。

result = json.loads(get_total_items(url))

如何访问JSON对象中的元素(这是JSON对象解析的示例,您需要根据JSON对象添加循环条件)

result['M'][0]['A'][0]['companydisplayname']

我建议您在python中使用requestsjson模块。

以上是关于如何使用BS4获取html正文的特定部分的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 BS4 和 LXML 获取 xpath

如何使用 java 从多部分数据中读取正文内容?

python BeautifulSoup获取网页正文

在 Python 中使用 BS4 抓取数据,嵌套表

如何使用 PHP 简单的 html dom 获取特定的表格单元格值

如何使用BS4从标签外部提取文本