beautifulsoup 的children和descandants
Posted onhacker
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了beautifulsoup 的children和descandants相关的知识,希望对你有一定的参考价值。
之前看爬虫的时候,看到这里就断了,一直不太理解这2个的区别。
今天重新看,也借助了这位哥们的方法,把结果打印出来,我大概知道了这2者的区别。
http://www.cnblogs.com/chensimin1990/p/6725803.html
--------------------------------
from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") bs_obj = BeautifulSoup(html,\'html.parser\') # name_list = bs_obj.find_all("span", {"class":"green"}) #for name in name_list: # print(name.get_text()) # file = open(\'test.txt\',\'w\') # content = \'\' for child in bs_obj.find("table",{"id":"giftList"}).descendants: print(child)
代码是这样的
------------------------------------------------
<tr><th> Item Title </th><th> Description </th><th> Cost </th><th> Image </th></tr> <th> Item Title </th> Item Title <th> Description </th> Description <th> Cost </th> Cost <th> Image </th> Image <tr class="gift" id="gift1"><td> Vegetable Basket </td><td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span> </td><td> $15.00 </td><td> <img src="../img/gifts/img1.jpg"/> </td></tr> <td> Vegetable Basket </td> Vegetable Basket <td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span> </td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span> Now with super-colorful bell peppers! <td> $15.00 </td> $15.00 <td> <img src="../img/gifts/img1.jpg"/> </td> <img src="../img/gifts/img1.jpg"/> <tr class="gift" id="gift2"><td> Russian Nesting Dolls </td><td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span> </td><td> $10,000.52 </td><td> <img src="../img/gifts/img2.jpg"/> </td></tr> <td> Russian Nesting Dolls </td> Russian Nesting Dolls <td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span> </td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span> 8 entire dolls per set! Octuple the presents! <td> $10,000.52 </td> $10,000.52 <td> <img src="../img/gifts/img2.jpg"/> </td> <img src="../img/gifts/img2.jpg"/> <tr class="gift" id="gift3"><td> Fish Painting </td><td> If something seems fishy about this painting, it\'s because it\'s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span> </td><td> $10,005.00 </td><td> <img src="../img/gifts/img3.jpg"/> </td></tr> <td> Fish Painting </td> Fish Painting <td> If something seems fishy about this painting, it\'s because it\'s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span> </td> If something seems fishy about this painting, it\'s because it\'s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span> Also hand-painted by trained monkeys! <td> $10,005.00 </td> $10,005.00 <td> <img src="../img/gifts/img3.jpg"/> </td> <img src="../img/gifts/img3.jpg"/> <tr class="gift" id="gift4"><td> Dead Parrot </td><td> This is an ex-parrot! <span class="excitingNote">Or maybe he\'s only resting?</span> </td><td> $0.50 </td><td> <img src="../img/gifts/img4.jpg"/> </td></tr> <td> Dead Parrot </td> Dead Parrot <td> This is an ex-parrot! <span class="excitingNote">Or maybe he\'s only resting?</span> </td> This is an ex-parrot! <span class="excitingNote">Or maybe he\'s only resting?</span> Or maybe he\'s only resting? <td> $0.50 </td> $0.50 <td> <img src="../img/gifts/img4.jpg"/> </td> <img src="../img/gifts/img4.jpg"/> <tr class="gift" id="gift5"><td> Mystery Box </td><td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span> </td><td> $1.50 </td><td> <img src="../img/gifts/img6.jpg"/> </td></tr> <td> Mystery Box </td> Mystery Box <td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span> </td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span> Keep your friends guessing! <td> $1.50 </td> $1.50 <td> <img src="../img/gifts/img6.jpg"/> </td> <img src="../img/gifts/img6.jpg"/>
这是结果
用children的函数(?不知道为什么叫函数,感觉没有括号,明明是字段啊...)
from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") bs_obj = BeautifulSoup(html,\'html.parser\') # name_list = bs_obj.find_all("span", {"class":"green"}) #for name in name_list: # print(name.get_text()) # file = open(\'test.txt\',\'w\') # content = \'\' for child in bs_obj.find("table",{"id":"giftList"}).children: print(child)
结果是这样的:
<tr><th> Item Title </th><th> Description </th><th> Cost </th><th> Image </th></tr> <tr class="gift" id="gift1"><td> Vegetable Basket </td><td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span> </td><td> $15.00 </td><td> <img src="../img/gifts/img1.jpg"/> </td></tr> <tr class="gift" id="gift2"><td> Russian Nesting Dolls </td><td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span> </td><td> $10,000.52 </td><td> <img src="../img/gifts/img2.jpg"/> </td></tr> <tr class="gift" id="gift3"><td> Fish Painting </td><td> If something seems fishy about this painting, it\'s because it\'s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span> </td><td> $10,005.00 </td><td> <img src="../img/gifts/img3.jpg"/> </td></tr> <tr class="gift" id="gift4"><td> Dead Parrot </td><td> This is an ex-parrot! <span class="excitingNote">Or maybe he\'s only resting?</span> </td><td> $0.50 </td><td> <img src="../img/gifts/img4.jpg"/> </td></tr> <tr class="gift" id="gift5"><td> Mystery Box </td><td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span> </td><td> $1.50 </td><td> <img src="../img/gifts/img6.jpg"/> </td></tr>
------------------
到说明的时候:
1、chidlren并不是只返回子代的第一层,而是到没有子代的那一层,也就是说会穿透所有的,这个我以前以为是descendants干的事。
2、那descendants还留着干嘛呢?
是这么一个作用,他对每一个子代都会遍历一边他所有的后代。
如果我们打个比方:
a
-a1
--a11
--a12
--a13
---a131
----a1311
如果用children,其实就是原样返回,如果用descendants的话,他会在a13的时候返回一次a1311,a131的时候又返回一次a1311。
另外
以上是关于beautifulsoup 的children和descandants的主要内容,如果未能解决你的问题,请参考以下文章
BeautifulSoup的高级应用 之 contents children descendants string strings stripped_strings
python爬虫的一个问题,'NoneType' object has no attribute 'children'?