使用Beautifulsoup通过文本获取Href

Question

我正在使用“请求”和“beautifulsoup”来搜索具有特定文本的网页中的所有href链接。我已经做到了，但是如果文本出现在新行中，beautifulsoup不会“看到”它并且不会返回该链接。

soup = BeautifulSoup(webpageAdress, "lxml")

path = soup.findAll('a', href=True, text="Something3")
print(path)

例：

像这样，它返回Something3文本的Href：

...
<a href="page1/somethingC.aspx">Something3</a>
...

像这样，它不会返回Something3文本的Href：

...
<a href="page1/somethingC.aspx">
Something3</a>
...

区别在于Href文本（Something3）在一个新行中。我无法更改html代码，因为我不是该网页的网站管理员。

知道怎么解决这个问题？

注意：我已经尝试过使用soup.replace（' n'，''）。replace（' r'，''）但是我得到错误NoneType'对象不可调用。

Answer 1

另一答案

Answer 2

另一答案