如何在每两个兄弟 <hr> 标签之间抓取内容?
Posted
技术标签:
【中文标题】如何在每两个兄弟 <hr> 标签之间抓取内容?【英文标题】:How to scrape content between every two sibling <hr> tag? 【发布时间】:2019-12-18 17:23:24 【问题描述】:我的真实情况很难描述,所以我直接举网站: https://www.w3schools.com/php/php_intro.asp
下面的元素非常长,你可以快速扫描它。当您打开链接时,您会发现每个内容块都会用上下两行(hr标签)框起来,所以我的目的是刮掉两个hr
标签之间的每个块内容
(其实难点在于标签数量不定,每两个hr标签之间结构变化无常)
如何实现?
<div class="w3-col l10 m12" id="main">
<div id="mainLeaderboard" style="overflow:hidden;">
<!-- MainLeaderboard-->
<!--<pre>main_leaderboard, all: [728,90][970,90][320,50][468,60]</pre>-->
<div id="snhb-main_leaderboard-0" data-google-query-id="CJmd77_F_OMCFUSJwgodAWAIsg"><div id="google_ads_iframe_/22152718/sws-hb//w3schools.com//main_leaderboard_0__container__" style="border: 0pt none;"><iframe id="google_ads_iframe_/22152718/sws-hb//w3schools.com//main_leaderboard_0" title="3rd party ad content" name="google_ads_iframe_/22152718/sws-hb//w3schools.com//main_leaderboard_0" scrolling="no" margin margin frameborder="0" srcdoc="" style="border: 0px; vertical-align: bottom;" data-google-container-id="d" data-load-complete="true"></iframe></div></div>
<!-- adspace leaderboard -->
</div>
<h1>Python <span class="color_h1">Tutorial</span></h1>
<div class="w3-clear nextprev">
<a class="w3-left w3-btn" href="/default.asp">❮ Home</a>
<a class="w3-right w3-btn" href="python_intro.asp">Next ❯</a>
</div>
<div class="w3-panel w3-info intro">
<p>Python is a programming language.</p>
<p>Python can be used on a server to create web applications.</p>
<a class="w3-btn w3-margin-bottom" href="python_intro.asp">Start learning Python now »</a>
</div>
<hr>
<h2>Learning by Examples</h2>
<p>Our "Show Python" tool makes it easy to learn Python, it shows both the
code and the result.</p>
<div class="w3-example">
<h3>Example</h3>
<div class="w3-code notranslate pythonHigh"><span class="pythoncolor" style="color:black">
<span class="pythonkeywordcolor" style="color:mediumblue">print</span>(<span class="pythonstringcolor" style="color:brown">"Hello, World!"</span>)<span class="pythonnumbercolor" style="color:red">
</span> </span></div>
<a target="_blank" class="w3-btn w3-margin-bottom" href="showpython.asp?filename=demo_default">Run example »</a>
</div>
<p><b>Click on the "Run example" button to see how it works.</b></p>
<hr>
<h2>Python File Handling</h2>
<p>In our File Handling section you will learn how to open, read, write, and
delete files.</p>
<p><a href="python_file_handling.asp">Python File Handling</a></p>
<hr>
<h2>Python Database Handling</h2>
<p>In our database section you will learn how to access and work with mysql and MongoDB databases:</p>
<p><a href="python_mysql_getstarted.asp">Python MySQL Tutorial</a></p>
<p><a href="python_mongodb_getstarted.asp">Python MongoDB Tutorial</a></p>
<hr>
<h2>Python Exercises</h2>
<form autocomplete="off" id="w3-exerciseform" action="exercise.asp?filename=exercise_syntax1" method="post" target="_blank">
<h2>Test Yourself With Exercises</h2>
<div class="exercisewindow">
<h2>Exercise:</h2>
<p>Insert the missing part of the code below to output "Hello World".</p>
<div class="exerciseprecontainer">
<pre><input name="ex1" maxlength="5" style="width: 54px;">("Hello World")
</pre>
</div>
<br>
<button type="submit" class="w3-btn w3-margin-bottom">Submit Answer »</button>
<p><a target="_blank" href="exercise.asp?filename=exercise_syntax1">Start the Exercise</a></p>
</div>
</form>
<hr>
<div id="midcontentadcontainer" style="overflow:auto;text-align:center">
<!-- MidContent -->
<!--<pre>mid_content, all: [300,250][336,280][728,90][970,250][970,90][320,50][468,60]</pre>-->
<div id="snhb-mid_content-0" data-google-query-id="CNqS8r_F_OMCFUSJwgodAWAIsg"><div id="google_ads_iframe_/22152718/sws-hb//w3schools.com//mid_content_0__container__" style="border: 0pt none;"><iframe id="google_ads_iframe_/22152718/sws-hb//w3schools.com//mid_content_0" title="3rd party ad content" name="google_ads_iframe_/22152718/sws-hb//w3schools.com//mid_content_0" scrolling="no" margin margin frameborder="0" srcdoc="" style="border: 0px; vertical-align: bottom;" data-google-container-id="f" data-load-complete="true"></iframe></div></div>
</div>
<hr>
<h2>Python Examples</h2>
<p>Learn by examples! This tutorial supplements all explanations with clarifying examples.</p>
<p><a href="python_examples.asp" class="w3-button w3-light-grey">See All Python Examples</a></p>
<hr>
<h2>Python Quiz</h2>
<p>Learn by taking a quiz! This quiz will give you a signal of how much you know, or do not know, about Python.</p>
<p><a href="python_quiz.asp" class="w3-btn w3-blue">Python Quiz</a></p>
<hr>
<h2>Python Reference</h2>
<p>You will also find complete function and method references:</p>
<p><a href="python_reference.asp">Reference Overview</a></p>
<p><a href="python_ref_functions.asp">Built-in Functions</a></p>
<p><a href="python_ref_string.asp">String Methods</a></p>
<p><a href="python_ref_list.asp">List/Array Methods</a></p>
<p><a href="python_ref_dictionary.asp">Dictionary Methods</a></p>
<p><a href="python_ref_tuple.asp">Tuple Methods</a></p>
<p><a href="python_ref_set.asp">Set Methods</a></p>
<p><a href="python_ref_file.asp">File Methods</a></p>
<p><a href="python_ref_keywords.asp">Python Keywords</a></p>
<hr>
<h2>Download Python</h2>
<p>Download Python from the official Python web site:
<a target="_blank" href="https://python.org/">https://python.org</a></p>
<hr>
<h2>Python Exam - Get Your Diploma!</h2>
<div class="w3-row">
<div class="w3-third w3-container w3-padding-24"><a href="/cert/default.asp"><img src="/images/w3certified_logo_250.png" style="max-width:100%;" ></a> </div>
<div class="w3-twothird w3-container"><h2>W3Schools' Online Certification</h2>
<p>The perfect solution for professionals who need to balance work, family, and career building.</p>
<p>More than 25 000 certificates already issued!</p>
</div>
</div>
<p><a class="w3-btn" href="/cert/default.asp">Get Your Certificate »</a></p>
<p style="clear:both;">The <a href="/cert/default.asp">html Certificate</a> documents your knowledge of HTML.</p>
<p>The <a href="/cert/default.asp">CSS Certificate</a> documents your knowledge of advanced CSS.</p>
<p>The <a href="/cert/default.asp">javascript Certificate</a> documents your knowledge of JavaScript and HTML DOM.</p>
<p>The <a href="/cert/default.asp">Python Certificate</a> documents your knowledge of Python.</p>
<p>The <a href="/cert/default.asp">jQuery Certificate</a> documents your knowledge of jQuery.</p>
<p>The <a href="/cert/default.asp">SQL Certificate</a> documents your knowledge of SQL.</p>
<p>The <a href="/cert/default.asp">PHP Certificate</a> documents your knowledge of PHP and MySQL.</p>
<p>The <a href="/cert/default.asp">XML Certificate</a> documents your knowledge of XML, XML DOM and XSLT.</p>
<p>The <a href="/cert/default.asp">Bootstrap Certificate</a> documents your knowledge of the Bootstrap framework.</p>
<div class="w3-clear nextprev">
<a class="w3-left w3-btn" href="/default.asp">❮ Home</a>
<a class="w3-right w3-btn" href="python_intro.asp">Next ❯</a>
</div>
</div>
```**strong text**
【问题讨论】:
【参考方案1】:我不知道我是否明白这一点,但如果你只想调整内容,你可以只使用 css 来做到这一点,你可以在“Div Blocks”中组织你的内容并为每个设置相同的类而不是 hr 只是像这样放置一个边框底部
#main max-width:1170px; margin: 0 auto;
.bg_block width:100%; border-bottom: 1px solid #666; padding: 20px; box-sizing: border-box;
<div id='main'>
<div class='bg_block'>
<div class="w3-clear nextprev">
<a class="w3-left w3-btn" href="/default.asp">❮ Home</a>
<a class="w3-right w3-btn" href="python_intro.asp">Next ❯</a>
</div>
<div class="w3-panel w3-info intro">
<p>Python is a programming language.</p>
<p>Python can be used on a server to create web applications.</p>
<a class="w3-btn w3-margin-bottom" href="python_intro.asp">Start learning Python now »</a>
</div>
</div><!--bg_block-->
</div><!--main-->
【讨论】:
以上是关于如何在每两个兄弟 <hr> 标签之间抓取内容?的主要内容,如果未能解决你的问题,请参考以下文章