scrapy基础

Posted 2020-06-09

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了scrapy基础相关的知识，希望对你有一定的参考价值。

fetch：使用Scrapy下载器(downloader)下载给定的URL，并将获取到的内容送到标准输出

scrapy fetch --nolog http://www.23andme.com

view：scrapy view 将页面download本地通过浏览器加载打开，发现淘宝和京东是不能加载出来的。

scrapy view http://www.taobao.com
scrapy view http://www.23mofang.com
scrapy view http://www.jd.com
scrapy view http://http://www.amazon.cn/
scrapy view http://www.amazon.cn/

list:列出有哪些爬虫

SimilarFacedeMacBook-Pro:spiders similarface$ scrapy list
amazonbook
stackoverflow
taobao
Similar

edit:默认会调用vim进行对爬虫修改

SimilarFacedeMacBook-Pro:spiders similarface$ scrapy edit stackoverflow

shell:scrapy的终端

#打开丑事百科玩玩
SimilarFacedeMacBook-Pro:spiders similarface$ scrapy shell http://www.qiushibaike.com/
>>> response
<200 http://www.qiushibaike.com/>
>>> response.url
‘http://www.qiushibaike.com/‘
>>> response.encoding
‘utf-8‘
>>> response.headers
{‘Set-Cookie‘: [‘_qqq_uuid_="2|1:0|10:1453947674|10:_qqq_uuid_|56:MDlhM2ZlODM2N2UxZGE0YmYyNjU4MmExM2Q0OTE3MzU4NTliNzIyMg==|505b66b8fc9bc1936ce339417c5c6be46d0cfc570baa61ce378c033c18af4358"; Domain=.qiushibaike.com; expires=Sat, 27 Feb 2016 02:21:14 GMT; Path=/‘], ‘Vary‘: [‘User-Agent‘], ‘Server‘: [‘nginx‘], ‘Date‘: [‘Thu, 28 Jan 2016 02:21:14 GMT‘], ‘Content-Type‘: [‘text/html; charset=UTF-8‘]}
>>> response.meta
{‘download_timeout‘: 180.0, ‘handle_httpstatus_all‘: True, ‘download_latency‘: 0.13596606254577637, ‘depth‘: 0, ‘download_slot‘: ‘www.qiushibaike.com‘}
>>> response.status
200
>>> dir(response)
[‘_DEFAULT_ENCODING‘, ‘__class__‘, ‘__delattr__‘, ‘__dict__‘, ‘__doc__‘, ‘__format__‘, ‘__getattribute__‘, ‘__hash__‘, ‘__init__‘, ‘__module__‘, ‘__new__‘, ‘__reduce__‘, ‘__reduce_ex__‘, ‘__repr__‘, ‘__setattr__‘, ‘__sizeof__‘, ‘__slots__‘, ‘__str__‘, ‘__subclasshook__‘, ‘__weakref__‘, ‘_auto_detect_fun‘, ‘_body‘, ‘_body_declared_encoding‘, ‘_body_inferred_encoding‘, ‘_cached_benc‘, ‘_cached_selector‘, ‘_cached_ubody‘, ‘_declared_encoding‘, ‘_encoding‘, ‘_get_body‘, ‘_get_url‘, ‘_headers_encoding‘, ‘_set_body‘, ‘_set_url‘, ‘_url‘, ‘body‘, ‘body_as_unicode‘, ‘copy‘, ‘css‘, ‘encoding‘, ‘flags‘, ‘headers‘, ‘meta‘, ‘replace‘, ‘request‘, ‘selector‘, ‘status‘, ‘url‘, ‘urljoin‘, ‘xpath‘]
>>> print(response.body.decode(‘utf-8‘))
...
<div class="content">

我是一个观众，我有话要说，从一个观众的角度，我们喜欢六小龄童老师的孙悟空，陪我们长大。今年是猴年，多希望春晚的舞台上可以有孙悟空。但是，你们选出来的节目，是老百姓喜欢的吗？tfboys 韩国明星，那些来参加合适吗？春晚是全国人的春晚，不是你们自己的春晚！希望做成百姓的春晚，谢谢！
<!--1453944031-->
</div>
...

以上是关于scrapy基础的主要内容，如果未能解决你的问题，请参考以下文章

Scrapy Spider没有返回所有元素

Scrapy 基础-01

Scrapy 基础

Python编程基础之（五）Scrapy爬虫框架

python的scrapy框架爬虫基础

scrapy 基础使用以及错误方案