dockerCentOS7.4+Python3.7+selenium+Firefox+tesseract的搭建

Posted 西加加先生

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了dockerCentOS7.4+Python3.7+selenium+Firefox+tesseract的搭建相关的知识,希望对你有一定的参考价值。

当前Docker容器配置:

  • Centos7.4
  • python2.7.5

目标Docker容器配置:

  • Centos7.4
  • python3.7.4
  • selenium 3.141.0
  • geckodriver 0.15
  • firefox 56.0.2
  • Pillow 6.1.0
  • pytesseract 0.2.7

安装依赖环境

[root@bf8feb8d5089 /]# yum install -y zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel libffi-devel gcc make wget git unzip gcc gcc-c++ libjpeg-devel libpng-devel libgif-devel

创建目录存放安装包

[root@bf8feb8d5089 /]# mkdir /usr/local/download 
[root@bf8feb8d5089 /]# cd /usr/local/download

安装Python3.7.4

[root@bf8feb8d5089 /]# cd /usr/local/download
[root@bf8feb8d5089 download]# wget https://www.python.org/ftp/python/3.7.4/Python-3.7.4.tgz
[root@bf8feb8d5089 download]# tar -xvf Python-3.7.4.tgz

# 编译
[root@bf8feb8d5089 download]# cd Python-3.7.4

[root@bf8feb8d5089 Python-3.7.4]# ./configure 


# 编译和安装
[root@bf8feb8d5089 Python-3.7.4]# make && make install

# 备份源文件
[root@bf8feb8d5089 Python-3.7.4]# mv /usr/bin/python /usr/bin/python.bak

# 软连接
[root@bf8feb8d5089 Python-3.7.4]# find / -name python3
/usr/local/bin/python3
[root@bf8feb8d5089 Python-3.7.4]# ln -s /usr/local/bin/python3 /usr/bin/python
[root@bf8feb8d5089 Python-3.7.4]# ln -s /usr/local/bin/pip3 /usr/bin/pip


# 修改yum文件(因为yum是python2写的)
[root@bf8feb8d5089 Python-3.7.4]# vi /usr/bin/yum
将第一行python改为python2.7
如果存在/usr/libexec/urlgrabber-ext-down,则将其中的python也改了


# 配置pip源
[root@bf8feb8d5089 Python-3.7.4]# cd 
[root@bf8feb8d5089 ~]# mkdir .pip
[root@bf8feb8d5089 ~]# vi .pip/pip.conf
#写入如下内容
[global]
index-url=http://pypi.douban.com/simple
trusted-host = pypi.douban.com 

根据需求安装所需包

[root@bf8feb8d5089 ~]# pip install requests
[root@bf8feb8d5089 ~]# pip install Pillow
[root@bf8feb8d5089 ~]# pip install httplib2
[root@bf8feb8d5089 ~]# pip install excel

安装tesseract

# 安装leptonica
[root@bf8feb8d5089 ~]# cd /usr/local/download/
[root@bf8feb8d5089 download]# wget http://www.leptonica.org/source/leptonica-1.72.tar.gz
[root@bf8feb8d5089 download]# tar xvzf leptonica-1.72.tar.gz
[root@bf8feb8d5089 download]# cd leptonica-1.72/
[root@bf8feb8d5089 leptonica-1.72]# ./configure
[root@bf8feb8d5089 leptonica-1.72]# make && make install


# 安装tesseract-3.04
[root@bf8feb8d5089 leptonica-1.72]# cd ..
[root@bf8feb8d5089 download]# wget https://github.com/tesseract-ocr/tesseract/archive/3.04.zip
[root@bf8feb8d5089 download]# unzip 3.04.zip && cd tesseract-3.04/
[root@bf8feb8d5089 tesseract-3.04]# ./configure
[root@bf8feb8d5089 tesseract-3.04]# make && make install
# 手动更新动态链接库
[root@bf8feb8d5089 tesseract-3.04]# ldconfig
[root@bf8feb8d5089 tesseract-3.04]# pip install pytesseract

# 安装语言包
在https://github.com/tesseract-ocr/tessdata 下载对应语言的模型文件
由于目前只需要识别手机号码和英文,只下载一个eng.traineddata文件即可,
将模型文件移动到/usr/local/share/tessdata
然后即可进行识别

# 示例
import pytesseract
from PIL import Image

image = Image.open('bb.png')
code = pytesseract.image_to_string(image)
print(code)

安装selenium+Firefox+Xvfb

[root@bf8feb8d5089 tesseract-3.04]# yum install -y Xvfb gtk3 gtk3-devel libXfont xorg-x11-fonts* libgtk-3.so.0 bzip2 
[root@bf8feb8d5089 tesseract-3.04]# pip install xvfbwrapper selenium pyvirtualdisplay

# 安装浏览器
[root@bf8feb8d5089 tesseract-3.04]# cd /usr/local/download/
[root@bf8feb8d5089 download]# wget https://ftp.mozilla.org/pub/firefox/releases/56.0.2/linux-x86_64/en-US/firefox-56.0.2.tar.bz2
[root@bf8feb8d5089 download]# tar xjvf firefox-56.0.2.tar.bz2
[root@bf8feb8d5089 download]# rm -f /usr/bin/firefox
[root@bf8feb8d5089 download]# ln -s /usr/local/download/firefox/firefox /usr/bin/firefox

# 安装geckodriver
[root@bf8feb8d5089 download]# wget https://github.com/mozilla/geckodriver/releases/download/v0.15.0/geckodriver-v0.15.0-linux64.tar.gz
[root@bf8feb8d5089 download]# tar xvzf geckodriver-*.tar.gz
[root@bf8feb8d5089 download]# rm -f /usr/bin/geckodriver
[root@bf8feb8d5089 download]# ln -s /usr/local/download/geckodriver /usr/bin/geckodriver    # 软链接必须用绝对路径

测试用例:

#!/usr/bin/python
# -*- coding:utf-8 -*-
from selenium import webdriver
from pyvirtualdisplay import Display
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
display = Display(visible=0, size=(800,600))
display.start()
binary = FirefoxBinary('/usr/bin/firefox')
driver = webdriver.Firefox(firefox_binary=binary)
driver.get('https://www.baidu.com')
print(driver.title)
driver.quit()
display.stop()

关注公众号西加加先生一起玩转Python
技术图片

以上是关于dockerCentOS7.4+Python3.7+selenium+Firefox+tesseract的搭建的主要内容,如果未能解决你的问题,请参考以下文章

Ubuntu更新python3.5到python3.7

Python3.7.3安装(Win10)

python3.7 完美安装

centos 7 编译 python3.7.0

Kali环境下安装Python3.7

linux更新python3.7