python3.5配置pillow，pytesseract与selenium

Posted 2022-12-06 古月今犹在

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python3.5配置pillow，pytesseract与selenium相关的知识，希望对你有一定的参考价值。

pip install pillow
pip install pytesseract

安装tesseract-ocr-setup-3.05.00dev.exe并配置环境变量，cmd输入tesseract可运行。
打开文件 pytesseract.py，找到如下代码，将tesseract_cmd的值修改为全路径，如下：
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
# tesseract_cmd = 'tesseract'
tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'

pip install selenium
selenium插件需要使用浏览器驱动，并配置浏览器驱动环境变量。本人使用谷歌浏览器，火狐也可以。
驱动放在谷歌浏览器安装根目录。谷歌浏览器驱动及对应版本 (在谷歌浏览器地址栏输入 chrome:version 查看谷歌浏览器版本信息）
http://chromedriver.storage.googleapis.com/index.html（谷歌浏览器驱动下载地址）
http://blog.csdn.net/huilan_same/article/details/51896672（对应版本）

示例代码：

from PIL import Image,ImageEnhance
import pytesseract
from selenium import webdriver
import os

#谷歌浏览器驱动位置
chromedriver = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver

driver = webdriver.Chrome(chromedriver)

url="****" #登陆页面地址
path1=r"C:\\Users\\黄\\Desktop\\test\\vc1.png" #网页截图
path2=r"C:\\Users\\黄\\Desktop\\test\\vc2.png" #验证码区域截图

driver.maximize_window() #将浏览器最大化
driver.get(url)
driver.save_screenshot(path1) #截取当前网页，该网页有我们需要的验证码

ck = driver.find_element_by_name('ck')#特征值
print(ck.get_attribute('value'))

imgelement = driver.find_element_by_id('verificationCodeImg')
location = imgelement.location #获取验证码x,y轴坐标
size=imgelement.size #获取验证码的长宽
print(location)
print(size)
#若不准备，在在线ps中打开图片，看验证码x,y坐标
rangle=(int(location['x']),int(location['y']),int(location['x']+size['width']),int(location['y']+size['height']))

i=Image.open(path1) #打开截图
frame4=i.crop(rangle) #使用Image的crop函数，从截图中再次截取我们需要的区域
frame4.save(path2)
im=Image.open(path2)
imgry = im.convert('L')#转换灰度图
imgry.show() #显示灰度图
text=pytesseract.image_to_string(imgry).strip()

sharpness =ImageEnhance.Contrast(imgry)#对比度增强
sharp_img = sharpness.enhance(2.0)
text=pytesseract.image_to_string(sharp_img).strip() #使用image_to_string识别验证码
vc=""
for i in text:
if i.isdigit(): #判断i是不是int
vc += i
elif i.isalpha(): #判断i是不是str
cv += i
else:
pass
print(text)
print(vc)
driver.quit()

以上是关于python3.5配置pillow，pytesseract与selenium的主要内容，如果未能解决你的问题，请参考以下文章