centos7 pyspider环境安装
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了centos7 pyspider环境安装相关的知识,希望对你有一定的参考价值。
PySpider 是一个我个人认为非常方便并且功能强大的爬虫框架,支持多线程爬取、JS动态解析,提供了可操作界面、出错重试、定时爬取等等的功能,使用非常人性化。
网上的参考文档:
http://www.jianshu.com/p/8eb248697475
http://cuiqingcai.com/2652.html
https://yq.aliyun.com/articles/75518
1.搭建环境:
python版本:3.6.3
系统环境:centos7.3
1.1.搭建python3环境:
# 下载依赖
yum install -y ncurses-devel openssl openssl-devel zlib-devel gcc make glibc-devel libffi-devel glibc-static glibc-utils sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libcurl-devel
# 下载python
wget https://www.python.org/ftp/python/3.6.3/Python-3.6.3.tgz
#解压
tar -xf Python-3.6.3.tgz
#编译安装
./configure --prefix=/usr/local/python3.6 --enable-shared
make && make install
# 建立软链接
ln -s /usr/local/python3.6/bin/python3 /usr/bin/python3
echo "/usr/local/python3.6/lib" > /etc/ld.so.conf.d/python3.5.conf
ldconfig
# 验证python3
[[email protected] local]# python3
Python 3.6.3 (default, Oct 9 2017, 04:01:24)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
#pip
/usr/local/python3.6/bin/pip3 install --upgrade pip
ln -s /usr/local/python3.6/bin/pip /usr/bin/pip
1.2.安装pyspider
pip install pyspider
启动python中的pycurl模块出现如下问题:
ImportError: pycurl: libcurl link-time ssl backend (nss) is different from compile-time ssl backend (none/other)
解决方法:
pip uninstall pycurl
export PYCURL_SSL_LIBRARY=nss
pip install pycurl
1.3.安装phantomjs
官网下载:http://phantomjs.org/download.html
wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
解压:
yum -y install unbzip2
bzip2 -d phantomjs-2.1.1-linux-x86_64.tar.bz2
tar -xf phantomjs-2.1.1-linux-x86_64.tar
mv phantomjs-2.1.1-linux-x86_64 phantomjs
ln -sv /usr/local/phantomjs/bin/phantomjs /usr/bin/phantomjs
1.4.启动pyspider
由于放在公网,编辑了一个配置文件config.json ,用于登录认证
[[email protected] local]# vim config.json
{
"webui": {
"port": "5000",
"username": "abc",
"password": "123456",
"need-auth": true
}
}
开启进程
nohup pyspider --config config.json &
进入web界面:
本文出自 “LinuxNew” 博客,请务必保留此出处http://jimchen.blog.51cto.com/10026955/1970969
以上是关于centos7 pyspider环境安装的主要内容,如果未能解决你的问题,请参考以下文章
windows环境安装phantomjs和pyspider遇到的问题