Scrapy-Xpath Function
Posted jwr810
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Scrapy-Xpath Function相关的知识,希望对你有一定的参考价值。
Refer to :https://doc.scrapy.org/en/latest/topics/selectors.html#topics-selectors
>>> response.xpath("//a/@href").getall()
[‘image1.html‘, ‘image2.html‘, ‘image3.html‘, ‘image4.html‘, ‘image5.html‘]
****************************************************************************************
>>> response.xpath(‘//a[contains(@href, "image")]/text()‘).re(r‘Name:s*(.*)‘)
[‘My image 1‘,
‘My image 2‘,
‘My image 3‘,
‘My image 4‘,
‘My image 5‘]
****************************************************************************************
>>> response.xpath(‘//a[contains(@href, "image")]/text()‘).re_first(r‘Name:s*(.*)‘) ‘My image 1‘
****************************************************************************************
Get and extract_first
>>> response.css(‘a::attr(href)‘).get()
‘image1.html‘
>>> response.css(‘a::attr(href)‘).extract_first()
‘image1.html‘
****************************************************************************************
>>> response.css(‘a::attr(href)‘).getall() [‘image1.html‘, ‘image2.html‘, ‘image3.html‘, ‘image4.html‘, ‘image5.html‘] >>> response.css(‘a::attr(href)‘).extract() [‘image1.html‘, ‘image2.html‘, ‘image3.html‘, ‘image4.html‘, ‘image5.html‘]
****************************************************************************************
>>> response.css(‘a::attr(href)‘)[0].get() ‘image1.html‘ >>> response.css(‘a::attr(href)‘)[0].extract() ‘image1.html‘
****************************************************************************************
CSS 模糊匹配class
>>> from scrapy import Selector
>>> sel = Selector(text=‘<div class="hero shout"><time datetime="2014-07-23 19:00">Special date</time></div>‘)
>>> sel.css(‘.shout‘).xpath(‘./time/@datetime‘).getall()
[‘2014-07-23 19:00‘]
****************************************************************************************
>>> from scrapy import Selector >>> sel = Selector(text=""" ....: <ul class="list"> ....: <li>1</li> ....: <li>2</li> ....: <li>3</li> ....: </ul> ....: <ul class="list"> ....: <li>4</li> ....: <li>5</li> ....: <li>6</li> ....: </ul>""") >>> xp = lambda x: sel.xpath(x).getall()
>>> xp("//li[1]") [‘<li>1</li>‘, ‘<li>4</li>‘]
>>> xp("(//li)[1]")
[‘<li>1</li>‘]
****************************************************************************************
[‘<a href="#">Click here to go to the <strong>Next Page</strong></a>‘]
>>> sel.xpath("string(//a[1])").getall() # convert it to string
[‘Click here to go to the Next Page‘]
>>> sel.xpath("//a[contains(.//text(), ‘Next Page‘)]").getall()
[]
>>> sel.xpath("//a[contains(., ‘Next Page‘)]").getall()
[‘<a href="#">Click here to go to the <strong>Next Page</strong></a>‘]
****************************************************************************************
以上是关于Scrapy-Xpath Function的主要内容,如果未能解决你的问题,请参考以下文章
imgwarp.cpp:3143: error: (-215:Assertion failed) _src.total() > 0 in function ‘warpPerspective‘(代码片段