beautifulsoup 的children和descandants

Posted onhacker

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了beautifulsoup 的children和descandants相关的知识,希望对你有一定的参考价值。

之前看爬虫的时候,看到这里就断了,一直不太理解这2个的区别。

今天重新看,也借助了这位哥们的方法,把结果打印出来,我大概知道了这2者的区别。

http://www.cnblogs.com/chensimin1990/p/6725803.html

 

--------------------------------

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bs_obj = BeautifulSoup(html,\'html.parser\')

# name_list = bs_obj.find_all("span", {"class":"green"})

#for name in name_list:
#    print(name.get_text())  
# file = open(\'test.txt\',\'w\')
# content = \'\'
for child in bs_obj.find("table",{"id":"giftList"}).descendants:
    print(child)

代码是这样的

------------------------------------------------

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>
<th>
Item Title
</th>

Item Title

<th>
Description
</th>

Description

<th>
Cost
</th>

Cost

<th>
Image
</th>

Image



<tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>
<td>
Vegetable Basket
</td>

Vegetable Basket

<td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td>

This vegetable basket is the perfect gift for your health conscious (or overweight) friends!

<span class="excitingNote">Now with super-colorful bell peppers!</span>
Now with super-colorful bell peppers!


<td>
$15.00
</td>

$15.00

<td>
<img src="../img/gifts/img1.jpg"/>
</td>


<img src="../img/gifts/img1.jpg"/>




<tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>
<td>
Russian Nesting Dolls
</td>

Russian Nesting Dolls

<td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td>

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"!
<span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
8 entire dolls per set! Octuple the presents!


<td>
$10,000.52
</td>

$10,000.52

<td>
<img src="../img/gifts/img2.jpg"/>
</td>


<img src="../img/gifts/img2.jpg"/>




<tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it\'s because it\'s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr>
<td>
Fish Painting
</td>

Fish Painting

<td>
If something seems fishy about this painting, it\'s because it\'s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td>

If something seems fishy about this painting, it\'s because it\'s a fish!
<span class="excitingNote">Also hand-painted by trained monkeys!</span>
Also hand-painted by trained monkeys!


<td>
$10,005.00
</td>

$10,005.00

<td>
<img src="../img/gifts/img3.jpg"/>
</td>


<img src="../img/gifts/img3.jpg"/>




<tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he\'s only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr>
<td>
Dead Parrot
</td>

Dead Parrot

<td>
This is an ex-parrot! <span class="excitingNote">Or maybe he\'s only resting?</span>
</td>

This is an ex-parrot!
<span class="excitingNote">Or maybe he\'s only resting?</span>
Or maybe he\'s only resting?


<td>
$0.50
</td>

$0.50

<td>
<img src="../img/gifts/img4.jpg"/>
</td>


<img src="../img/gifts/img4.jpg"/>




<tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>
<td>
Mystery Box
</td>

Mystery Box

<td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td>

If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining.
<span class="excitingNote">Keep your friends guessing!</span>
Keep your friends guessing!


<td>
$1.50
</td>

$1.50

<td>
<img src="../img/gifts/img6.jpg"/>
</td>


<img src="../img/gifts/img6.jpg"/>

这是结果

 

用children的函数(?不知道为什么叫函数,感觉没有括号,明明是字段啊...)

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bs_obj = BeautifulSoup(html,\'html.parser\')

# name_list = bs_obj.find_all("span", {"class":"green"})

#for name in name_list:
#    print(name.get_text())  
# file = open(\'test.txt\',\'w\')
# content = \'\'
for child in bs_obj.find("table",{"id":"giftList"}).children:
    print(child)

结果是这样的:

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>


<tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>


<tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>


<tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it\'s because it\'s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr>


<tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he\'s only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr>


<tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>

------------------

到说明的时候:

1、chidlren并不是只返回子代的第一层,而是到没有子代的那一层,也就是说会穿透所有的,这个我以前以为是descendants干的事。

2、那descendants还留着干嘛呢?

是这么一个作用,他对每一个子代都会遍历一边他所有的后代。

如果我们打个比方:

a

-a1

--a11

--a12

--a13

---a131

----a1311

如果用children,其实就是原样返回,如果用descendants的话,他会在a13的时候返回一次a1311,a131的时候又返回一次a1311。

另外

for child in bs_obj.find("table",{"id":"giftList"}).children 和 
for child in bs_obj.find("table",{"id":"giftList"})是等价的,想想也知道,这个更符合一般人的直觉。

以上是关于beautifulsoup 的children和descandants的主要内容,如果未能解决你的问题,请参考以下文章

BeautifulSoup的高级应用 之 contents children descendants string strings stripped_strings

BeautifulSoup模块详细介绍

python爬虫的一个问题,'NoneType' object has no attribute 'children'?

d3.js之树形折叠树

BeautifulSoup文档3-详细方法 | 如何对文档树进行遍历?

BeautifulSoup4的学习