pyQuery

Posted 在路上的少年

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pyQuery相关的知识,希望对你有一定的参考价值。

  

pyquery – PyQuery complete API

选择器基本支持jQuery用法

class pyquery.pyquery.PyQuery(*args**kwargs)

The main class

class Fn

Hook for defining custom function (like the jQuery.fn):

>>> fn = lambda: this.map(lambda i, el: PyQuery(this).outerhtml())
>>> PyQuery.fn.listOuterHtml = fn
>>> S = PyQuery(
...   \'<ol>   <li>Coffee</li>   <li>Tea</li>   <li>Milk</li>   </ol>\')
>>> S(\'li\').listOuterHtml()
[\'<li>Coffee</li>\', \'<li>Tea</li>\', \'<li>Milk</li>\']
PyQuery.addClass(value)

Alias for add_class()

PyQuery.add_class(value)

Add a css class to elements:

>>> d = PyQuery(\'<div></div>\')
>>> d.add_class(\'myclass\')
[<div.myclass>]
>>> d.addClass(\'myclass\')
[<div.myclass>]
PyQuery.after(value)

add value after nodes

PyQuery.append(value)

append value to each nodes

PyQuery.appendTo(value)

Alias for append_to()

PyQuery.append_to(value)

append nodes to value

PyQuery.base_url

Return the url of current html document or None if not available.

PyQuery.before(value)

insert value before nodes

PyQuery.children(selector=None)

Filter elements that are direct children of self using optional selector:

>>> d = PyQuery(\'<span><p class="hello">Hi</p><p>Bye</p></span>\')
>>> d
[<span>]
>>> d.children()
[<p.hello>, <p>]
>>> d.children(\'.hello\')
[<p.hello>]
PyQuery.clone()

return a copy of nodes

PyQuery.closest(selector=None)
>>> d = PyQuery(
...  \'<div class="hello"><p>This is a \'
...  \'<strong class="hello">test</strong></p></div>\')
>>> d(\'strong\').closest(\'div\')
[<div.hello>]
>>> d(\'strong\').closest(\'.hello\')
[<strong.hello>]
>>> d(\'strong\').closest(\'form\')
[]
PyQuery.contents()

Return contents (with text nodes):

>>> d = PyQuery(\'hello <b>bold</b>\')
>>> d.contents()  
[\'hello \', <Element b at ...>]
PyQuery.each(func)

apply func on each nodes

PyQuery.empty()

remove nodes content

PyQuery.encoding

return the xml encoding of the root element

PyQuery.end()

Break out of a level of traversal and return to the parent level.

>>> m = \'<p><span><em>Whoah!</em></span></p><p><em> there</em></p>\'
>>> d = PyQuery(m)
>>> d(\'p\').eq(1).find(\'em\').end().end()
[<p>, <p>]
PyQuery.eq(index)

Return PyQuery of only the element with the provided index:

>>> d = PyQuery(\'<p class="hello">Hi</p><p>Bye</p><div></div>\')
>>> d(\'p\').eq(0)
[<p.hello>]
>>> d(\'p\').eq(1)
[<p>]
>>> d(\'p\').eq(2)
[]
PyQuery.extend(other)

Extend with anoter PyQuery object

PyQuery.filter(selector)

Filter elements in self using selector (string or function):

>>> d = PyQuery(\'<p class="hello">Hi</p><p>Bye</p>\')
>>> d(\'p\')
[<p.hello>, <p>]
>>> d(\'p\').filter(\'.hello\')
[<p.hello>]
>>> d(\'p\').filter(lambda i: i == 1)
[<p>]
>>> d(\'p\').filter(lambda i: PyQuery(this).text() == \'Hi\')
[<p.hello>]
>>> d(\'p\').filter(lambda i, this: PyQuery(this).text() == \'Hi\')
[<p.hello>]
PyQuery.find(selector)

Find elements using selector traversing down from self:

>>> m = \'<p><span><em>Whoah!</em></span></p><p><em> there</em></p>\'
>>> d = PyQuery(m)
>>> d(\'p\').find(\'em\')
[<em>, <em>]
>>> d(\'p\').eq(1).find(\'em\')
[<em>]
PyQuery.hasClass(name)

Alias for has_class()

PyQuery.has_class(name)

Return True if element has class:

>>> d = PyQuery(\'<div class="myclass"></div>\')
>>> d.has_class(\'myclass\')
True
>>> d.hasClass(\'myclass\')
True
PyQuery.height(value=<NoDefault>)

set/get height of element

PyQuery.hide()

remove display:none to elements style

>>> print(PyQuery(\'<div style="display:none;"/>\').hide())
<div style="display: none"/>
PyQuery.html(value=<NoDefault>**kwargs)

Get or set the html representation of sub nodes.

Get the text value:

>>> d = PyQuery(\'<div><span>toto</span></div>\')
>>> print(d.html())
<span>toto</span>

Extra args are passed to lxml.etree.tostring:

>>> d = PyQuery(\'<div><span></span></div>\')
>>> print(d.html())
<span/>
>>> print(d.html(method=\'html\'))
<span></span>

Set the text value:

>>> d.html(\'<span>Youhou !</span>\')
[<div>]
>>> print(d)
<div><span>Youhou !</span></div>
PyQuery.insertAfter(value)

Alias for insert_after()

PyQuery.insertBefore(value)

Alias for insert_before()

PyQuery.insert_after(value)

insert nodes after value

PyQuery.insert_before(value)

insert nodes before value

PyQuery.is_(selector)

Returns True if selector matches at least one current element, else False:

>>> d = PyQuery(\'<p class="hello"><span>Hi</span></p><p>Bye</p>\')
>>> d(\'p\').eq(0).is_(\'.hello\')
True
>>> d(\'p\').eq(0).is_(\'span\')
False
>>> d(\'p\').eq(1).is_(\'.hello\')
False
PyQuery.items(selector=None)

Iter over elements. Return PyQuery objects:

>>> d = PyQuery(\'<div><span>foo</span><span>bar</span></div>\')
>>> [i.text() for i in d.items(\'span\')]
[\'foo\', \'bar\']
>>> [i.text() for i in d(\'span\').items()]
[\'foo\', \'bar\']
>>> list(d.items(\'a\')) == list(d(\'a\').items())
True

Make all links absolute.

PyQuery.map(func)

Returns a new PyQuery after transforming current items with func.

func should take two arguments - ‘index’ and ‘element’. Elements can also be referred to as ‘this’ inside of func:

>>> d = PyQuery(\'<p class="hello">Hi there</p><p>Bye</p><br />\')
>>> d(\'p\').map(lambda i, e: PyQuery(e).text())
[\'Hi there\', \'Bye\']

>>> d(\'p\').map(lambda i, e: len(PyQuery(this).text()))
[8, 3]

>>> d(\'p\').map(lambda i, e: PyQuery(this).text().split())
[\'Hi\', \'there\', \'Bye\']
PyQuery.nextAll(selector=None)

Alias for next_all()

PyQuery.next_all(selector=None)
>>> h = \'<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>\'
>>> d = PyQuery(h)
>>> d(\'p:last\').next_all()
[<img>]
>>> d(\'p:last\').nextAll()
[<img>]
PyQuery.not_(selector)

Return elements that don’t match the given selector:

>>> d = PyQuery(\'<p class="hello">Hi</p><p>Bye</p><div></div>\')
>>> d(\'p\').not_(\'.hello\')
[<p>]
PyQuery.outerHtml()

Alias for outer_html()

PyQuery.outer_html()

Get the html representation of the first selected element:

>>> d = PyQuery(\'<div><span class="red">toto</span> rocks</div>\')
>>> print(d(\'span\'))
<span class="red">toto</span> rocks
>>> print(d(\'span\').outer_html())
<span class="red">toto</span>
>>> print(d(\'span\').outerHtml())
<span class="red">toto</span>

>>> S = PyQuery(\'<p>Only <b>me</b> & myself</p>\')
>>> print(S(\'b\').outer_html())
<b>me</b>
PyQuery.parents(selector=None)
>>> d = PyQuery(\'<span><p class="hello">Hi</p><p>Bye</p></span>\')
>>> d(\'p\').parents()
[<span>]
>>> d(\'.hello\').parents(\'span\')
[<span>]
>>> d(\'.hello\').parents(\'p\')
[]
PyQuery.prepend(value)

prepend value to nodes

PyQuery.prependTo(value)

Alias for prepend_to()

PyQuery.prepend_to(value)

prepend nodes to value

PyQuery.prevAll(selector=None)

Alias for prev_all()

PyQuery.prev_all(selector=None)
>>> h = \'<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>\'
>>> d = PyQuery(h)
>>> d(\'p:last\').prev_all()
[<p.hello>]
>>> d(\'p:last\').prevAll()
[<p.hello>]
PyQuery.remove(expr=<NoDefault>)

Remove nodes:

>>> h = \'<div>Maybe <em>she</em> does <strong>NOT</strong> know</div>\'
>>> d = PyQuery(h)
>>> d(\'strong\').remove()
[<strong>]
>>> print(d)
<div>Maybe <em>she</em> does   know</div>
PyQuery.removeAttr(name)

Alias for remove_attr()

PyQuery.removeClass(value)

Alias for remove_class()

PyQuery.remove_attr(name)

Remove an attribute:

>>> d = PyQuery(\'<div id="myid"></div>\')
>>> d.remove_attr(\'id\')
[<div>]
>>> d.removeAttr(\'id\')
[<div>]
PyQuery.remove_class(value)

Remove a css class to elements:

>>> d = PyQuery(\'<div class="myclass"></div>\')
>>> d.remove_class(\'myclass\')
[<div>]
>>> d.removeClass(\'myclass\')
[<div>]
PyQuery.remove_namespaces()

Remove all namespaces:

>>> doc = PyQuery(\'<foo xmlns="http://example.com/foo"></foo>\')
>>> doc
[<{http://example.com/foo}foo>]
>>> doc.remove_namespaces()
[<foo>]
PyQuery.replaceAll(expr)

Alias for replace_all()

PyQuery.replaceWith(value)

Alias for replace_with()

PyQuery.replace_all(expr)

replace nodes by expr

PyQuery.replace_with(value)

replace nodes by value:

>>> doc = PyQuery("<html><div /></html>")
>>> node = PyQuery("<span />")
>>> child = doc.find(\'div\')
>>> child.replace_with(node)

[<div>] >>> print(doc) <html><span/></html>

PyQuery.root

return the xml root element

PyQuery.show()

add display:block to elements style

>>> print(PyQuery(\'<div />\').show())
<div style="display: block"/>
PyQuery.siblings(selector=None)
>>> h = \'<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>\'
>>> d = PyQuery(h)
>>> d(\'.hello\').siblings()
[<p>, <img>]
>>> d(\'.hello\').siblings(\'img\')
[<img>]
PyQuery.text(value=<NoDefault>)

Get or set the text representation of sub nodes.

Get the text value:

>>> doc = PyQuery(\'<div><span>toto</span><span>tata</span></div>\')
>>> print(doc.text())
toto tata

Set the text value:

>>> doc.text(\'Youhou !\')
[<div>]
>>> print(doc)
<div>Youhou !</div>
PyQuery.toggleClass(value)

Alias for toggle_class()

PyQuery.toggle_class(value)

Toggle a css class to elements

>>> d = PyQuery(\'<div></div>\')
>>> d.toggle_class(\'myclass\')
[<div.myclass>]
>>> d.toggleClass(\'myclass\')
[<div>]
PyQuery.val(value=<NoDefault>)

Set the attribute value:

>>> d = PyQuery(\'<input />\')
>>> d.val(\'Youhou\')
[<input>]

Get the attribute value:

>>> d.val()
\'Youhou\'
PyQuery.width(value=<NoDefault>)

set/get width of element

PyQuery.wrap(value)

A string of HTML that will be created on the fly and wrapped around each target:

>>> d = PyQuery(\'<span>youhou</span>\')
>>> d.wrap(\'<div></div>\')
[<div>]
>>> print(d)
<div><span>youhou</span></div>
PyQuery.wrapAll(value)

Alias for wrap_all()

PyQuery.wrap_all(value)

Wrap all the elements in the matched set into a single wrapper element:

>>> d = PyQuery(\'<div><span>Hey</span><span>you !</span></div>\')
>>> print(d(\'span\').wrap_all(\'<div id="wrapper"></div>\'))
<div id="wrapper"><span>Hey</span><span>you !</span></div>

>>> d = PyQuery(\'<div><span>Hey</span><span>you !</span></div>\')
>>> print(d(\'span\').wrapAll(\'<div id="wrapper"></div>\'))
<div id="wrapper"><span>Hey</span><span>you !</span></div>
PyQuery.xhtml_to_html()

Remove xhtml namespace:

>>> doc = PyQuery(
...         \'<html xmlns="http://www.w3.org/1999/xhtml"></html>\')
>>> doc
[<{http://www.w3.org/1999/xhtml}html>]
>>> doc.xhtml_to_html()
[<html>]


项目中的使用

 

 

 

str(PQ)和PQ.outer_html()都会将部分便签由<tag></tag>变为<tag />,神奇的是有些标签变为这样形式后,
这样的字符串浏览器会解析不出来。\'<!DOCTYPE html>\'字符串会在变为PQ对象后自动剔除了。

以上是关于pyQuery的主要内容,如果未能解决你的问题,请参考以下文章

pyspider示例代码三:用PyQuery解析页面数据

Python爬虫编程思想(64): 在pyquery中使用CSS选择器

Python爬虫编程思想(64): 在pyquery中使用CSS选择器

pyquery 安装

# [爬虫Demo] pyquery+csv爬取猫眼电影top100

Python爬虫编程思想(66): 使用pyquery获取节点信息