pyQuery
Posted 在路上的少年
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pyQuery相关的知识,希望对你有一定的参考价值。
pyquery
– PyQuery complete API
选择器基本支持jQuery用法
- class
pyquery.pyquery.
PyQuery
(*args, **kwargs) -
The main class
- class
Fn
-
Hook for defining custom function (like the jQuery.fn):
>>> fn = lambda: this.map(lambda i, el: PyQuery(this).outerhtml()) >>> PyQuery.fn.listOuterHtml = fn >>> S = PyQuery( ... \'<ol> <li>Coffee</li> <li>Tea</li> <li>Milk</li> </ol>\') >>> S(\'li\').listOuterHtml() [\'<li>Coffee</li>\', \'<li>Tea</li>\', \'<li>Milk</li>\']
PyQuery.
addClass
(value)-
Alias for
add_class()
PyQuery.
add_class
(value)-
Add a css class to elements:
>>> d = PyQuery(\'<div></div>\') >>> d.add_class(\'myclass\') [<div.myclass>] >>> d.addClass(\'myclass\') [<div.myclass>]
PyQuery.
after
(value)-
add value after nodes
PyQuery.
append
(value)-
append value to each nodes
PyQuery.
appendTo
(value)-
Alias for
append_to()
PyQuery.
append_to
(value)-
append nodes to value
PyQuery.
base_url
-
Return the url of current html document or None if not available.
PyQuery.
before
(value)-
insert value before nodes
PyQuery.
children
(selector=None)-
Filter elements that are direct children of self using optional selector:
>>> d = PyQuery(\'<span><p class="hello">Hi</p><p>Bye</p></span>\') >>> d [<span>] >>> d.children() [<p.hello>, <p>] >>> d.children(\'.hello\') [<p.hello>]
PyQuery.
clone
()-
return a copy of nodes
PyQuery.
closest
(selector=None)-
>>> d = PyQuery( ... \'<div class="hello"><p>This is a \' ... \'<strong class="hello">test</strong></p></div>\') >>> d(\'strong\').closest(\'div\') [<div.hello>] >>> d(\'strong\').closest(\'.hello\') [<strong.hello>] >>> d(\'strong\').closest(\'form\') []
PyQuery.
contents
()-
Return contents (with text nodes):
>>> d = PyQuery(\'hello <b>bold</b>\') >>> d.contents() [\'hello \', <Element b at ...>]
PyQuery.
each
(func)-
apply func on each nodes
PyQuery.
empty
()-
remove nodes content
PyQuery.
encoding
-
return the xml encoding of the root element
PyQuery.
end
()-
Break out of a level of traversal and return to the parent level.
>>> m = \'<p><span><em>Whoah!</em></span></p><p><em> there</em></p>\' >>> d = PyQuery(m) >>> d(\'p\').eq(1).find(\'em\').end().end() [<p>, <p>]
PyQuery.
eq
(index)-
Return PyQuery of only the element with the provided index:
>>> d = PyQuery(\'<p class="hello">Hi</p><p>Bye</p><div></div>\') >>> d(\'p\').eq(0) [<p.hello>] >>> d(\'p\').eq(1) [<p>] >>> d(\'p\').eq(2) []
PyQuery.
extend
(other)-
Extend with anoter PyQuery object
PyQuery.
filter
(selector)-
Filter elements in self using selector (string or function):
>>> d = PyQuery(\'<p class="hello">Hi</p><p>Bye</p>\') >>> d(\'p\') [<p.hello>, <p>] >>> d(\'p\').filter(\'.hello\') [<p.hello>] >>> d(\'p\').filter(lambda i: i == 1) [<p>] >>> d(\'p\').filter(lambda i: PyQuery(this).text() == \'Hi\') [<p.hello>] >>> d(\'p\').filter(lambda i, this: PyQuery(this).text() == \'Hi\') [<p.hello>]
PyQuery.
find
(selector)-
Find elements using selector traversing down from self:
>>> m = \'<p><span><em>Whoah!</em></span></p><p><em> there</em></p>\' >>> d = PyQuery(m) >>> d(\'p\').find(\'em\') [<em>, <em>] >>> d(\'p\').eq(1).find(\'em\') [<em>]
PyQuery.
hasClass
(name)-
Alias for
has_class()
PyQuery.
has_class
(name)-
Return True if element has class:
>>> d = PyQuery(\'<div class="myclass"></div>\') >>> d.has_class(\'myclass\') True >>> d.hasClass(\'myclass\') True
PyQuery.
height
(value=<NoDefault>)-
set/get height of element
PyQuery.
hide
()-
remove display:none to elements style
>>> print(PyQuery(\'<div style="display:none;"/>\').hide()) <div style="display: none"/>
PyQuery.
html
(value=<NoDefault>, **kwargs)-
Get or set the html representation of sub nodes.
Get the text value:
>>> d = PyQuery(\'<div><span>toto</span></div>\') >>> print(d.html()) <span>toto</span>
Extra args are passed to
lxml.etree.tostring
:>>> d = PyQuery(\'<div><span></span></div>\') >>> print(d.html()) <span/> >>> print(d.html(method=\'html\')) <span></span>
Set the text value:
>>> d.html(\'<span>Youhou !</span>\') [<div>] >>> print(d) <div><span>Youhou !</span></div>
PyQuery.
insertAfter
(value)-
Alias for
insert_after()
PyQuery.
insertBefore
(value)-
Alias for
insert_before()
PyQuery.
insert_after
(value)-
insert nodes after value
PyQuery.
insert_before
(value)-
insert nodes before value
PyQuery.
is_
(selector)-
Returns True if selector matches at least one current element, else False:
>>> d = PyQuery(\'<p class="hello"><span>Hi</span></p><p>Bye</p>\') >>> d(\'p\').eq(0).is_(\'.hello\') True
>>> d(\'p\').eq(0).is_(\'span\') False
>>> d(\'p\').eq(1).is_(\'.hello\') False
PyQuery.
items
(selector=None)-
Iter over elements. Return PyQuery objects:
>>> d = PyQuery(\'<div><span>foo</span><span>bar</span></div>\') >>> [i.text() for i in d.items(\'span\')] [\'foo\', \'bar\'] >>> [i.text() for i in d(\'span\').items()] [\'foo\', \'bar\'] >>> list(d.items(\'a\')) == list(d(\'a\').items()) True
PyQuery.
make_links_absolute
(base_url=None)-
Make all links absolute.
PyQuery.
map
(func)-
Returns a new PyQuery after transforming current items with func.
func should take two arguments - ‘index’ and ‘element’. Elements can also be referred to as ‘this’ inside of func:
>>> d = PyQuery(\'<p class="hello">Hi there</p><p>Bye</p><br />\') >>> d(\'p\').map(lambda i, e: PyQuery(e).text()) [\'Hi there\', \'Bye\'] >>> d(\'p\').map(lambda i, e: len(PyQuery(this).text())) [8, 3] >>> d(\'p\').map(lambda i, e: PyQuery(this).text().split()) [\'Hi\', \'there\', \'Bye\']
PyQuery.
nextAll
(selector=None)-
Alias for
next_all()
PyQuery.
next_all
(selector=None)-
>>> h = \'<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>\' >>> d = PyQuery(h) >>> d(\'p:last\').next_all() [<img>] >>> d(\'p:last\').nextAll() [<img>]
PyQuery.
not_
(selector)-
Return elements that don’t match the given selector:
>>> d = PyQuery(\'<p class="hello">Hi</p><p>Bye</p><div></div>\') >>> d(\'p\').not_(\'.hello\') [<p>]
PyQuery.
outerHtml
()-
Alias for
outer_html()
PyQuery.
outer_html
()-
Get the html representation of the first selected element:
>>> d = PyQuery(\'<div><span class="red">toto</span> rocks</div>\') >>> print(d(\'span\')) <span class="red">toto</span> rocks >>> print(d(\'span\').outer_html()) <span class="red">toto</span> >>> print(d(\'span\').outerHtml()) <span class="red">toto</span> >>> S = PyQuery(\'<p>Only <b>me</b> & myself</p>\') >>> print(S(\'b\').outer_html()) <b>me</b>
PyQuery.
parents
(selector=None)-
>>> d = PyQuery(\'<span><p class="hello">Hi</p><p>Bye</p></span>\') >>> d(\'p\').parents() [<span>] >>> d(\'.hello\').parents(\'span\') [<span>] >>> d(\'.hello\').parents(\'p\') []
PyQuery.
prepend
(value)-
prepend value to nodes
PyQuery.
prependTo
(value)-
Alias for
prepend_to()
PyQuery.
prepend_to
(value)-
prepend nodes to value
PyQuery.
prevAll
(selector=None)-
Alias for
prev_all()
PyQuery.
prev_all
(selector=None)-
>>> h = \'<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>\' >>> d = PyQuery(h) >>> d(\'p:last\').prev_all() [<p.hello>] >>> d(\'p:last\').prevAll() [<p.hello>]
PyQuery.
remove
(expr=<NoDefault>)-
Remove nodes:
>>> h = \'<div>Maybe <em>she</em> does <strong>NOT</strong> know</div>\' >>> d = PyQuery(h) >>> d(\'strong\').remove() [<strong>] >>> print(d) <div>Maybe <em>she</em> does know</div>
PyQuery.
removeAttr
(name)-
Alias for
remove_attr()
PyQuery.
removeClass
(value)-
Alias for
remove_class()
PyQuery.
remove_attr
(name)-
Remove an attribute:
>>> d = PyQuery(\'<div id="myid"></div>\') >>> d.remove_attr(\'id\') [<div>] >>> d.removeAttr(\'id\') [<div>]
PyQuery.
remove_class
(value)-
Remove a css class to elements:
>>> d = PyQuery(\'<div class="myclass"></div>\') >>> d.remove_class(\'myclass\') [<div>] >>> d.removeClass(\'myclass\') [<div>]
PyQuery.
remove_namespaces
()-
Remove all namespaces:
>>> doc = PyQuery(\'<foo xmlns="http://example.com/foo"></foo>\') >>> doc [<{http://example.com/foo}foo>] >>> doc.remove_namespaces() [<foo>]
PyQuery.
replaceAll
(expr)-
Alias for
replace_all()
PyQuery.
replaceWith
(value)-
Alias for
replace_with()
PyQuery.
replace_all
(expr)-
replace nodes by expr
PyQuery.
replace_with
(value)-
replace nodes by value:
>>> doc = PyQuery("<html><div /></html>") >>> node = PyQuery("<span />") >>> child = doc.find(\'div\') >>> child.replace_with(node)
[<div>] >>> print(doc) <html><span/></html>
PyQuery.
root
-
return the xml root element
PyQuery.
show
()-
add display:block to elements style
>>> print(PyQuery(\'<div />\').show()) <div style="display: block"/>
PyQuery.
siblings
(selector=None)-
>>> h = \'<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>\' >>> d = PyQuery(h) >>> d(\'.hello\').siblings() [<p>, <img>] >>> d(\'.hello\').siblings(\'img\') [<img>]
PyQuery.
text
(value=<NoDefault>)-
Get or set the text representation of sub nodes.
Get the text value:
>>> doc = PyQuery(\'<div><span>toto</span><span>tata</span></div>\') >>> print(doc.text()) toto tata
Set the text value:
>>> doc.text(\'Youhou !\') [<div>] >>> print(doc) <div>Youhou !</div>
PyQuery.
toggleClass
(value)-
Alias for
toggle_class()
PyQuery.
toggle_class
(value)-
Toggle a css class to elements
>>> d = PyQuery(\'<div></div>\') >>> d.toggle_class(\'myclass\') [<div.myclass>] >>> d.toggleClass(\'myclass\') [<div>]
PyQuery.
val
(value=<NoDefault>)-
Set the attribute value:
>>> d = PyQuery(\'<input />\') >>> d.val(\'Youhou\') [<input>]
Get the attribute value:
>>> d.val() \'Youhou\'
PyQuery.
width
(value=<NoDefault>)-
set/get width of element
PyQuery.
wrap
(value)-
A string of HTML that will be created on the fly and wrapped around each target:
>>> d = PyQuery(\'<span>youhou</span>\') >>> d.wrap(\'<div></div>\') [<div>] >>> print(d) <div><span>youhou</span></div>
PyQuery.
wrapAll
(value)-
Alias for
wrap_all()
PyQuery.
wrap_all
(value)-
Wrap all the elements in the matched set into a single wrapper element:
>>> d = PyQuery(\'<div><span>Hey</span><span>you !</span></div>\') >>> print(d(\'span\').wrap_all(\'<div id="wrapper"></div>\')) <div id="wrapper"><span>Hey</span><span>you !</span></div> >>> d = PyQuery(\'<div><span>Hey</span><span>you !</span></div>\') >>> print(d(\'span\').wrapAll(\'<div id="wrapper"></div>\')) <div id="wrapper"><span>Hey</span><span>you !</span></div>
PyQuery.
xhtml_to_html
()-
Remove xhtml namespace:
>>> doc = PyQuery( ... \'<html xmlns="http://www.w3.org/1999/xhtml"></html>\') >>> doc [<{http://www.w3.org/1999/xhtml}html>] >>> doc.xhtml_to_html() [<html>]
项目中的使用str(PQ)和PQ.outer_html()都会将部分便签由<tag></tag>变为<tag />,神奇的是有些标签变为这样形式后,
这样的字符串浏览器会解析不出来。\'<!DOCTYPE html>\'字符串会在变为PQ对象后自动剔除了。
- class
以上是关于pyQuery的主要内容,如果未能解决你的问题,请参考以下文章
Python爬虫编程思想(64): 在pyquery中使用CSS选择器
Python爬虫编程思想(64): 在pyquery中使用CSS选择器