如何使用python在字符串html中逐行添加“\ n”?
Posted
技术标签:
【中文标题】如何使用python在字符串html中逐行添加“\\ n”?【英文标题】:How to add "\n" line by line in string html with python?如何使用python在字符串html中逐行添加“\ n”? 【发布时间】:2021-12-26 04:30:38 【问题描述】:我有一行html格式如下:
<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li><li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li><li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li><li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li><li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li></ol>
HTML 代码将具有未指定的属性 class 和 id。关闭 HTML 代码时,我需要逐行添加“\n”。 我在 Python 中的代码是这样的:
TAGS = ['p', 'h1', 'h2', 'h3', 'h4', 'li', 'img','ol']
SINGLE_LINE_TAGS = ['ul', 'ol']
INLINE_TAGS = ['strong', 'i', 'u', 'em']
html = '''<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li><li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li><li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li><li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li><li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li></ol>'''
for tag in TAGS:
html = html.replace('</>'.format(tag), '</>\n'.format(tag))
for tag in SINGLE_LINE_TAGS:
html = html.replace('<>'.format(tag), '<>\n'.format(tag))
html = html.replace('</>'.format(tag), '</>\n'.format(tag))
html = html.replace(' />', ' />\n')
print(html)
但结果是:
<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li>
<li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li>
<li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li>
<li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li>
<li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li>
</ol>
为什么不是这个:
<ol class="X5LH0c">
<li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li>
<li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li>
<li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li>
<li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li>
<li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li>
</ol>
我不使用正则表达式。谁能帮我修复代码?感谢您的支持!
【问题讨论】:
你正在替换<ol>
,但你有<ol class="X5LH0c">
首先,你为什么要这样做? HTML 不应该被漂亮地打印出来,因为它旨在由您的浏览器解析,而不是由您解析。其次,“- ” 与您的任何标签名称都不匹配,因为它有一个属性。您必须使用 something like 正则表达式来解决问题,但即使它们也不是一个完美的工具。考虑学习 BeautifulSoup。
快速修复如下。
TAGS = ['p', 'h1', 'h2', 'h3', 'h4', 'li', 'img','ol']
SINGLE_LINE_TAGS = ['ul', 'ol']
INLINE_TAGS = ['strong', 'i', 'u', 'em']
html = '''<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li><li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li><li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li><li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li><li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li></ol>'''
# Xuống dòng mỗi thẻ
for tag in TAGS:
html = html.replace('<'.format(tag), '\n<'.format(tag))
for tag in SINGLE_LINE_TAGS:
html = html.replace('<>'.format(tag), '\n<>'.format(tag))
html = html.replace('</>'.format(tag), '\n</>'.format(tag))
print(html)
因此,我已将右括号 </>
的搜索替换为左括号 <
的搜索,并在左括号之前切换了换行符的位置(因此 <li class ...
也是替换)。
这在 html 文件的开头产生了一个额外的行,您可以使用 html = html[1:]
删除它。
更优雅的解决方案是使用正则表达式替换,但这完全取决于您想要的确切输出。
【讨论】:
是的!谢谢你的帮助!它和我一起工作,先生!稍后我会尝试使用正则表达式,因为我认为正则表达式很慢,我需要一些简单的方法来解决这个问题。字符串没问题,但是当使用 HTML 标记非常困难时,我会尝试像你分享的那样使用正则表达式。以上是关于如何使用python在字符串html中逐行添加“\ n”?的主要内容,如果未能解决你的问题,请参考以下文章