Python - 如何逐行读取 HTML [重复]

Posted 2023-02-22

技术标签:

【中文标题】Python - 如何逐行读取 HTML [重复]【英文标题】：Python - How to read HTML line by line [duplicate] 【发布时间】：2016-03-21 01:18:40 【问题描述】：

我正在尝试编写一个程序，它将获取一个 html 文件并输出每一行。我做错了什么，因为我的代码正在输出每个字母。如何将所有 HTML 行放入一个列表中？

这是目前为止的代码：

f = open("/home/tony/Downloads/page1/test.html", "r")
htmltext = f.read()
f.close()

for t in htmltext:
    print t + "\n"

【问题讨论】：

【参考方案1】：

f.read() 将尝试读取并产生每个字符，直到遇到 EOF。你想要的是f.readlines() 方法：

with open("/home/tony/Downloads/page1/test.html", "r") as f:
    for line in f.readlines():
        print(line) # The newline is included in line

【讨论】：

【参考方案2】：

您可以使用f.readlines() 代替f.read()。此函数返回文件中所有行的列表。

with open("/home/tony/Downloads/page1/test.html", "r") as f:
    for line in f.readlines():
        print(line)

您也可以使用list(f)。

f = open("/home/tony/Downloads/page1/test.html", "r")
f_lines = list(f)
for line in f_lines:
    print(line)

来源：https://docs.python.org/3.5/tutorial/inputoutput.html

【讨论】：

以上是关于Python - 如何逐行读取 HTML [重复]的主要内容，如果未能解决你的问题，请参考以下文章