在 python 中使用 beautifullsoup4 抓取网页时出现奇怪的文本缩进

Posted 2023-02-15

技术标签:

【中文标题】在 python 中使用 beautifullsoup4 抓取网页时出现奇怪的文本缩进【英文标题】：weird text indentation when web scraping with beautifullsoup4 in python 【发布时间】：2021-12-31 21:56:29 【问题描述】：

我正在尝试网页抓取 github

这是代码：

import requests as req
from bs4 import BeautifulSoup

urls = [
  "https://github.com/moom825/Discord-RAT",
  "https://github.com/freyacodes/Lavalink",
  "https://github.com/KagChi/lavalink-railways",
  "https://github.com/KagChi/lavalink-repl",
  "https://github.com/Devoxin/Lavalink.py",
  "https://github.com/karyeet/heroku-lavalink"]



r = req.get(urls[0])

soup = BeautifulSoup(r.content,"lxml")

title = str(soup.find("p",attrs="class":"f4 mt-3").text)
print(title)

当我运行程序时，我没有收到任何错误，但缩进很奇怪

请任何人帮我解决这个问题 我正在使用replit

【问题讨论】：

【参考方案1】：

Github 有 a really good API

您可以在.text 之后使用.strip()，然后它将删除空格。

import requests as req
from bs4 import BeautifulSoup

urls = [
  "https://github.com/moom825/Discord-RAT",
  "https://github.com/freyacodes/Lavalink",
  "https://github.com/KagChi/lavalink-railways",
  "https://github.com/KagChi/lavalink-repl",
  "https://github.com/Devoxin/Lavalink.py",
  "https://github.com/karyeet/heroku-lavalink"]



r = req.get(urls[0])

soup = BeautifulSoup(r.content,"lxml")

title = str(soup.find("p",attrs="class":"f4 mt-3").text.strip())
print(title)

【讨论】：

好吧，我是愚蠢的，我用 python 编程了 1.5 年，我不知道这个！！你总是瘦一些——我也是我猜这就是progging的工作原理？等一下，我不能接受这个答案我们需要 8 分钟才能接受

以上是关于在 python 中使用 beautifullsoup4 抓取网页时出现奇怪的文本缩进的主要内容，如果未能解决你的问题，请参考以下文章

在 python 中使用 soffice，Command 在终端中有效，但在 Python 子进程中无效

python 使用pymongo在python中使用MongoDB的示例

在 python 中使用命令行时出现语法错误

python 在python中使用全局变量

如何在 Python 3.x 和 Python 2.x 中使用 pip

在Python中使用Redis