读BeautifulSoup官方文档之html树的搜索

Posted 内脏坏了

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了读BeautifulSoup官方文档之html树的搜索相关的知识,希望对你有一定的参考价值。

除了find()和find_all(), 这里还提供了许多类似的方法我就细讲了, 参数和用法都差不多, 最后四个是next, previous是以.next/previous_element()来说的...

Signature: find_parents(nameattrsstringlimit**kwargs)

Signature: find_parent(nameattrsstring**kwargs)

Signature: find_next_siblings(nameattrsstringlimit**kwargs)

Signature: find_next_sibling(nameattrsstring**kwargs)

Signature: find_previous_siblings(nameattrsstringlimit**kwargs)

Signature: find_previous_sibling(nameattrsstring**kwargs)

Signature: find_all_next(nameattrsstringlimit**kwargs)

Signature: find_next(nameattrsstring**kwargs)

Signature: find_all_previous(nameattrsstringlimit**kwargs)

Signature: find_previous(nameattrsstring**kwargs)

 

BeautifulSoup也提供CSS选择器, 用法大致与css选择器相同, 我css也只是入门级别, 这里就不多解释了... :

 1 soup.select("title")
 2 # [<title>The Dormouse‘s story</title>]
 3 
 4 soup.select("p nth-of-type(3)")
 5 # [<p class="story">...</p>]
 6 
 7 soup.select("body a")
 8 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 9 #  <a class="sister" href="http://example.com/lacie"  id="link2">Lacie</a>,
10 #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
11 
12 soup.select("html head title")
13 # [<title>The Dormouse‘s story</title>]
14 
15 soup.select("head > title")
16 # [<title>The Dormouse‘s story</title>]
17 
18 soup.select("p > a")
19 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
20 #  <a class="sister" href="http://example.com/lacie"  id="link2">Lacie</a>,
21 #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
22 
23 soup.select("p > a:nth-of-type(2)")
24 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
25 
26 soup.select("p > #link1")
27 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
28 
29 soup.select("body > a")
30 # []
31 
32 #上面好像看懂了, 应该是 > 的话就是必须是孩子, 空格的话表示子孙.
33 
34 soup.select("#link1 ~ .sister")
35 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
36 #  <a class="sister" href="http://example.com/tillie"  id="link3">Tillie</a>]
37 
38 soup.select("#link1 + .sister")
39 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
40 
41 soup.select(".sister")
42 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
43 #  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
44 #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
45 
46 soup.select("#link1")
47 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
48 
49 soup.select("a#link2")
50 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
51 
52 #下面好像是通过id寻找 :
53 soup.select("#link1")
54 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
55 
56 soup.select("a#link2")
57 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
58 
59 #匹配任意一个
60 soup.select(“#link1,#link2”) 
61 # [<a class=”sister” href=”http://example.com/elsie” id=”link1”>Elsie</a>,
62 # <a class=”sister” href=”http://example.com/lacie” id=”link2”>Lacie</a>]
63 
64 #当然可以用属性的值来匹配
65 soup.select(a[href="http://example.com/elsie"])
66 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
67 
68 soup.select(a[href^="http://example.com/"])
69 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
70 #  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
71 #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
72 
73 soup.select(a[href$="tillie"])
74 # [<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
75 
76 soup.select(a[href*=".com/el"])
77 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
78 
79 #这个真看不懂
80 multilingual_markup = """
81  <p lang="en">Hello</p>
82  <p lang="en-us">Howdy, y‘all</p>
83  <p lang="en-gb">Pip-pip, old fruit</p>
84  <p lang="fr">Bonjour mes amis</p>
85 """
86 multilingual_soup = BeautifulSoup(multilingual_markup)
87 multilingual_soup.select(p[lang|=en])
88 # [<p lang="en">Hello</p>,
89 #  <p lang="en-us">Howdy, y‘all</p>,
90 #  <p lang="en-gb">Pip-pip, old fruit</p>]
91 
92 #选一个可以用select_one()
93 soup.select_one(".sister")
94 # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

 

以上是关于读BeautifulSoup官方文档之html树的搜索的主要内容,如果未能解决你的问题,请参考以下文章

读BeautifulSoup官方文档之html树的搜索

python爬虫之beautifulsoup的使用

python爬虫--解析网页几种方法之BeautifulSoup

第三节:Web爬虫之BeautifulSoup解析库

python之Beautiful Soup库

Python爬虫之Beautifulsoup模块的使用