在 BS4 中使用 find_all 获取文本作为列表

Posted 2023-03-14

技术标签:

【中文标题】在 BS4 中使用 find_all 获取文本作为列表【英文标题】：Using find_all in BS4 to get text as a list 【发布时间】：2017-07-27 21:19:18 【问题描述】：

首先我会说我是 Python 新手。我一直在用 discord.py 和 Beautiful Soup 4 构建一个 Discord 机器人。这就是我所在的位置：

@commands.command(hidden=True)
async def roster(self):
    """Gets a list of CD's members"""
    url = "http://www.clandestine.pw/roster.html"
    async with aiohttp.get(url) as response:
        soupObject = BeautifulSoup(await response.text(), "html.parser")
    try:
        text = soupObject.find_all("font", attrs='size': '4')
        await self.bot.say(text)
    except:
        await self.bot.say("Not found!")

这是输出：

现在，我尝试以多种不同的方式使用get_text() 来去除这段代码中的括号和 HTML 标记，但每次都会引发错误。我怎样才能实现这一点或将这些数据输出到数组或列表中，然后只打印纯文本？

【问题讨论】：

你用的是哪个版本的python和美汤？我假设它 >= python 3.5 给出了异步等待语法 【参考方案1】：

替换

text = soupObject.find_all("font", attrs='size': '4')

用这个：

all_font_tags = soupObject.find_all("font", attrs='size': '4')
list_of_inner_text = [x.text for x in all_font_tags]
# If you want to print the text as a comma separated string
text = ', '.join(list_of_inner_text)

【讨论】：

【参考方案2】：

您正在从 BeautifulSoup 返回一个 Tags 列表，您看到的括号来自列表对象。

要么将它们作为字符串列表返回：

 text = [Member.get_text().encode("utf-8").strip() for Member in soup.find_all("font", attrs='size': '4') if not Member.get_text().encode("utf-8").startswith("\xe2")]

或单个字符串：

text = ",".join([Member.get_text().encode("utf-8") for Member in soup.find_all("font", attrs='size': '4') if not Member.get_text().encode("utf-8").startswith("\xe2")])

【讨论】：

以上是关于在 BS4 中使用 find_all 获取文本作为列表的主要内容，如果未能解决你的问题，请参考以下文章