网页抓取中的间距[重复]

Posted 2023-02-23

技术标签:

【中文标题】网页抓取中的间距[重复]【英文标题】：Spacing in web scraping [duplicate] 【发布时间】：2021-03-31 16:03:31 【问题描述】：

我有以下 html 代码，我从其中提取文本“蛋白质家族的分类”，并在 python 中使用 BS。

<h1 class="item-title__primary">

    
        Classification of protein families

但是，当我将数据导出到 Excel 文件时，文本带有很多空格。我该如何克服这个问题？谢谢你。

网页抓取代码：

titles.append(soup.find('h1',class_='item-title__primary').text)

【问题讨论】：

【参考方案1】：

e = "  word  "

print(e.strip())

# "word"

e = "word  and  word 2"

print(e.replace("  ", " "))

# word and word 2

【讨论】：

如果预期结果是所有相邻空格都转换为一个空格的字符串，则建议的解决方案都不起作用。我会 ' '.join(s.split()) 这样做

以上是关于网页抓取中的间距[重复]的主要内容，如果未能解决你的问题，请参考以下文章