如何使用 BeautifulSoup 在 HTML 中处理不同的相同类

Posted 2023-02-15

技术标签:

【中文标题】如何使用 BeautifulSoup 在 HTML 中处理不同的相同类【英文标题】：How to handle differently same class in HTML with BeautifulSoup 【发布时间】：2022-01-22 02:01:18 【问题描述】：

我正在努力报废，并创建了以下代码。该网页有几个表（class="acta-table"），我想进一步深入研究。网页上有 12 个表格，我希望得到一些关于如何以不同方式处理每个表格的帮助。我想处理的 gols 和 target 表与 Titulars、Suplents、Equip Técnic 不同，...

from bs4 import BeautifulSoup
from bs4.element import Stylesheet
import requests
import openpyxl

excel = openpyxl.Workbook()
# print(excel.sheetnames)
sheet = excel.active
sheet.title = "Acta Partido"
sheet.append(['Equipo Local', '', '', 'Equipo Visitante'])
# print (excel.sheetnames)

try:

    source = requests.get(
        'https://www.fcf.cat/acta/2022/futbol-11/cadet-primera-divisio/grup-2/1c/sant-ignasi-ce-a/1c/lhospitalet-centre-esports-b')

    source.raise_for_status()

    soup = BeautifulSoup(source.text, 'html.parser')

    actaEquipos = soup.find_all('div', class_='acta-equip')
    actaMarcador = soup.find('div', class_='acta-marcador').text.split("-")
    acta = soup.find_all(name='table', class_='acta-table')

    actaTitulo = soup.find('span', class_='apex').text.split("-")
    sheet.append([actaTitulo[0].strip(), actaMarcador[0].strip(),
                 actaMarcador[1].strip(), actaTitulo[1].strip()])

    for titulars in acta:
        print(titulars.getText())

except Exception as e:
    print(e)

excel.save('ActaPartido.xlsx')

谢谢，

【问题讨论】：

您要从网站中提取哪个表？ Bhavya 正如我所提到的，我想以不同的方式处理每个表以附加到 Excel。对于表格（标题、补充、装备技术），我想提取每一行的名称并在 home 和 away 之间分开，但在（gols 和 targetes）我需要进行更多操作。 【参考方案1】：

认为您可以简单地检查表格的内容并根据条件处理您的操作：

for t in soup.select('table.acta-table'):
    if 'Gols' in t.thead.text:
        print('do something special with gols')
    elif 'Targetes' in t.thead.text:
        print('do something special with targetes')
    else:
        print('do almost the same with the rest')

示例

from bs4.element import Stylesheet
import requests

source = requests.get('https://www.fcf.cat/acta/2022/futbol-11/cadet-primera-divisio/grup-2/1c/sant-ignasi-ce-a/1c/lhospitalet-centre-esports-b')
source.raise_for_status()

soup = BeautifulSoup(source.text, 'html.parser')
    
for t in soup.select('table.acta-table'):
    if 'Gols' in t.thead.text:
        for x in t.select('tr:not(:has(th))'):
            print(list(x.stripped_strings))
    elif 'Targetes' in t.thead.text:
        for x in t.select('tr:not(:has(th))'):
            print(list(x.stripped_strings))
    else:
        for x in t.select('tr:not(:has(th))'):
            print(list(x.stripped_strings))

【讨论】：

刺猬这是我正在寻找的提示！再次感谢。我看到您已经使用 select 来了解 de CSS。这是正确的吗？ for x in t.select('tr:not(:has(th))'): print(list(x.stripped_strings)) 为什么我不能打印(x[0]) ? x 不是列表吗？

以上是关于如何使用 BeautifulSoup 在 HTML 中处理不同的相同类的主要内容，如果未能解决你的问题，请参考以下文章

如何在 Python 中使用 BeautifulSoup 保存对 HTML 文件所做的更改？

如何使用 BeautifulSoup 从 HTML 中去除评论标签？

如何使用 BeautifulSoup 在 HTML 中处理不同的相同类

如何使用 BeautifulSoup 将 UTF-8 编码的 HTML 正确解析为 Unicode 字符串？ [复制]

如何解决'连接中止'。使用BeautifulSoup在Python中出错

如何仅使用BeautifulSoup和Python删除包含空格的HTML标记