当前Docker容器配置:
- Centos6.8
- python2.6.6
目标Docker容器配置:
- Centos6.8
- python2.7
- selenium 3.141.0
- geckodriver 0.15
- firefox 52.8.0
- Pillow 6.1.0
- pytesseract 0.2.7
安装依赖环境
yum install -y zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel libffi-devel gcc make wget git unzip gcc gcc-c++ libjpeg-devel libpng-devel libgif-devel
创建目录存放安装包
mkdir /usr/local/download
cd /usr/local/download
安装Python2.7
# 安装python2.7
wget https://www.python.org/ftp/python/2.7.15/Python-2.7.15.tgz
tar -zxvf Python-2.7.15.tgz
cd Python-2.7.15
./configure
make && make install
mv /usr/bin/python /usr/bin/python_bak
ln -s /usr/local/bin/python2.7 /usr/bin/python
# 安装pip
wget --no-check-certificate https://bootstrap.pypa.io/get-pip.py
python get-pip.py
ln -s /usr/local/bin/pip /usr/bin/pip
# 配置pip源(豆瓣)
cd
mkdir .pip
cd .pip
vi pip.conf
#写入如下内容:
[global]
index-url=http://pypi.douban.com/simple
trusted-host = pypi.douban.com
安装tesseract
# 先安装leptonica
cd /usr/local/download
wget http://www.leptonica.org/source/leptonica-1.72.tar.gz
tar xvzf leptonica-1.72.tar.gz
cd leptonica-1.72/
./configure
make && make install
# 安装tesseract
cd /usr/local/download
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.zip
unzip 3.04.zip
cd tesseract-3.04/
./configure
make && make install
# 手动更新动态链接库
ldconfig
# pip安装pytesseract
pip install pytesseract
# 安装语言包
在https://github.com/tesseract-ocr/tessdata 下载对应语言的模型文件
由于目前只需要识别手机号码和英文,只下载一个eng.traineddata文件即可,
将模型文件移动到/usr/local/share/tessdata
然后即可进行识别
# 示例
import pytesseract
from PIL import Image
image = Image.open(\'bb.png\')
code = pytesseract.image_to_string(image)
print(code)
安装selenium+firefox+geckodriver
安装selenium
pip install selenium
# 查看版本
pip show selenium
安装geckodriver
cd /usr/local/download
wget https://github.com/mozilla/geckodriver/releases/download/v0.15.0/geckodriver-v0.15.0-linux64.tar.gz
tar xvzf geckodriver-*.tar.gz
rm -f /usr/bin/geckodriver
# 软链接必须用绝对路径
ln -s /usr/local/download/geckodriver /usr/bin/geckodriver
安装firefox
cd /usr/local/download
wget http://www.rpmfind.net/linux/centos/6.10/os/x86_64/Packages/firefox-52.8.0-1.el6.centos.x86_64.rpm
yum install -y firefox-52.8.0-1.el6.centos.x86_64.rpm
安装中文字体
# 新建字体目录 chinese:
mkdir /usr/share/fonts/chinese
# 将windows系统盘 c:\\windows\\fonts\\中的字体直接上传至 centos 的 /usr/share/fonts/chinese目录下即可
chmod -R 755 /usr/share/fonts/chinese
yum -y install ttmkfdir
ttmkfdir -e /usr/share/X11/fonts/encodings/encodings.dir
# 修改fonts.conf的Font directory list,即字体列表,在这里需要把我们添加的中文字体位置加进去:
vi /etc/fonts/fonts.conf
<dir>/usr/share/fonts/chinese</dir>
# 刷新内存中的字体缓存,这样就不用reboot重启了:
fc-cache
# 最后再次通过fc-list看一下字体列表:
fc-list
安装 xvfb
在Linux中有1个很好用的工具xvfb,它是1个X服务可以用于在没有显示器的硬件和物理输入设备上运行
a,安装必需的软件包
[cat@localhost ~]# yum install -y xdg-utils xorg-x11-server-Xvfb xorg-x11-xkb-utils
a,安装xvfb的绑定
[cat@localhost ~]# pip install xvfbwrapper pyvirtualdisplay
测试用例:
#!/usr/bin/python
# -*- coding:utf-8 -*-
from selenium import webdriver
from pyvirtualdisplay import Display
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
display = Display(visible=0, size=(800,600))
display.start()
binary = FirefoxBinary(\'/usr/bin/firefox\')
driver = webdriver.Firefox(firefox_binary=binary)
driver.get(\'https://www.baidu.com\')
print(driver.title.encode(\'utf8\'))
driver.quit()
display.stop()
pip安装所需包
#安装包
pip install requests
pip install Pillow
pip install httplib2
pip install excel
参考:
CentOS6.8 安装python2.7,pip以及yum
关注公众号西加加先生
一起玩转Python。