在 BeautifulSoup 中使用多个条件

Posted 2023-02-23

技术标签:

【中文标题】在 BeautifulSoup 中使用多个条件【英文标题】：Use Multiple Conditions in BeautifulSoup 【发布时间】：2014-11-12 12:56:28 【问题描述】：

我们使用此代码查找包含文本“财政”的标签

soup.find(class_="label",text=re.compile("Fiscal"))

如何在此处输入多个条件。

假设标签同时包含“财政”和“年份”。

或包含“财政”而非“年份”的标签

【问题讨论】：

【参考方案1】：

如果您发现条件不同并且它们可能变得更复杂，那么您可以使用函数作为过滤器，例如：

假设标签同时包含“财政”和“年份”。

t = soup.find(class_="label", text=lambda s: "Fiscal" in s and "year" in s)

或包含“财政”而非“年份”的标签

t = soup.find(class_="label", text=lambda s: "Fiscal" in s and "year" not in s)

您也可以在此处使用正则表达式，但它的可读性可能较差。

【讨论】：

【参考方案2】：

您可以将文本作为列表传递（此站点是我上一个答案的示例：））

import requests
from bs4 import BeautifulSoup

res = requests.get('http://www.snapdeal.com/products/computers-laptops?sort=plrty&')
soup = BeautifulSoup(res.text)

elements = soup.find_all('div', 'class': 'lfloat', text=re.compile(r'(14|4)')) # | means 'or'

print elements

打印[<div class="lfloat">(14)</div>, <div class="lfloat">(4)</div>, <div class="lfloat">(45)</div>]

所以你可以在你的情况下使用：soup.find_all(class_="label",text=re.compile(r'(Fiscal|yeah)))

要通过精确匹配查找，您可以将text 作为列表传递：soup.find_all(class_="label",text=['Fiscal', 'yeah'])

“Fiscal and NOT Yeah”的逻辑可以用这个来覆盖：soup.find_all('div', 'class': 'lfloat', text=re.compile(r'(Fiscal|[^yeah])'))（这里不确定）

【讨论】：

这只匹配整个文本，而不是部分匹配。虽然只回答了部分问题，但是是的，匹配 OP 正在寻找的特定组合（或排除特定组合）的正则表达式就是答案。然而，一个确保两个词都存在的正则表达式并不是那么简单。你可以在这里用简单的'lfloat' 替换'class':'lfloat'。该列表查找文本中是否有任何个“财政”和“年份”词 - OP 要求两者。 “非年份”正则表达式不正确。

以上是关于在 BeautifulSoup 中使用多个条件的主要内容，如果未能解决你的问题，请参考以下文章