Scrapy-爬虫介绍

Posted 2020-12-02 benchdog

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Scrapy-爬虫介绍相关的知识，希望对你有一定的参考价值。

爬虫基本操作

　　1.应用

　　　　- 舆情系统：监听各大门户网站的热门词条、热门新闻，做进一步分析处理和展示

2.爬虫

　　- 定向

　　- 非定向

　　- 下载页面：

　　　　　　http://www.autohome.com.cn/news/

　　- 筛选：

　　　　　　正则表达式

　　======= 开源模块 =======

　　1.requests

　　　　pip3 install requests

　　　　response = requests.get(‘http://www.autohome.com.cn/news/‘)

　　　　response.text

　　2.beautifulsoup

　　　　pip3 install BeautifulSoup4

　　　　from bs4 import BeautifulSoup

　　　　soup = BeautiSoup(response.text,features=‘html.parser‘) #将html转换为对象，对象嵌套对象

　　　　target = soup.find(id=‘auto-channel-lazyload-article‘)

　　　　print(target)

爬虫并发方案

　　　　- 异步IO：gevent/Twisted/asyncio/aiohttp

　　　　- IO多路复用：select

Scrapy框架

　　　　- 异步IO：Twisted

以上是关于Scrapy-爬虫介绍的主要内容，如果未能解决你的问题，请参考以下文章