验证码识别之模板匹配方法

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了验证码识别之模板匹配方法相关的知识,希望对你有一定的参考价值。

参考技术A

在写爬虫的时候难免会遇到验证码识别的问题,常见的验证码识别的流程为:

- 图像灰度化

- 图像去噪(如图像二值化)

- 切割图片

- 提取特征

- 训练

但这种方法要切割图片,而且破解验证码的重点和难点就在于 能否成功分割字符 。

本文要介绍的算法 不需要进行图片切割,也不需要进行机器训练 ,这种方法就是模板匹配:将待识别的文字切割成一个个模板,在待识别的图像中去匹配模板。

这篇文章将分为两个部分:

第一部分介绍模板匹配的基本概念以及模板匹配的一种实现算法:快速归一化互相关匹配算法;

第二部分是一个具体实例。

模板匹配是在图像中寻找目标的方法之一,目的就是在一幅图像中寻找和模板图像最相似的区域。

模板匹配的大致过程是这样的:通过在输入图像上滑动图像块对实际的图像块和输入图像进行匹配。

假设我们有一张100x100的输入图像,有一张10x10的模板图像,查找的过程是这样的:

从输入图像的左上角(0,0)开始,切割一块(0,0)至(10,10)的临时图像;

用某种方法得出临时图像与模板的相似度c,存放到相似度矩阵中(矩阵大小为91 x91);

切割输入图像从(0,1)至(10,11)的临时图像,对比,并记录到相似度矩阵;

重复上述步骤,直到输入图像的右下角。

最终得到一个相似度矩阵,找到矩阵中的最大或最小值,最大值(最小值)对应的临时图像即为与模板最相似的图像。

在步骤b中,求模板与图像的相似度有多种方法,如平均绝对差算法(MAD)、绝对误差和算法(SAD)、误差平方和算法(SSD)、归一化互相关算法(NCC),本文使用的是归一化互相关算法。

什么是归一化互相关?

从几何图形上来看,空间中的两个向量,同方向平行时,归一化互相关系数为1,表示两个向量最相似,反方向平行时归一化互相关系数为-1,垂直时为0,表示最不相似(用互相垂直的三个向量来代表整个空间也是这个道理,垂直的向量之间不包含对方的信息,相关系数为0),存在一定夹角时处于(-1,1),是不是跟余弦函数很像,cos(0)=1,cos(pi/2)=0,cos(pi)=-1。就是这个样子的,相关系数可以看作是两个向量之间夹角的cosine函数。

在数学中是这么计算cosine函数的,假设两个n维向量X,Y,对应的坐标分别为(x1,x2,…xn), (y1,y2,…yn) 则:

(如果想要了解更多,请参考文献【2】)

但这是一维的,在模板匹配中要再加一个维度 (具体算法请参考文献【3】) ,简要说一下文献【3】的内容:如果直接计算二维相似度的话计算复杂度会非常高,文献【3】利用快速傅里叶变换与积分图像快速算法来降低计算复杂度。

接下来让我们看一个具体的应用。

模板匹配识别验证码的具体步骤为:

1. 找出图片中所有可能出现的字符,制作成模板集合

2. 图像灰度化

3. 图片去噪(二值化)

4. 模板匹配

5. 匹配结果优化

要识别的图片如下,以识别图片中的加字为例:



要从image中找到与模板最匹配的部分,Template图像是事先从image图像中截取的一部分。所用的为python模块skimage中的match_template方法,match_template方法使用的是快速归一化互相关算法 【2】 。

遍历模板图像集合,与图像匹配,如果dist大于阈值h,则认为此模板在图像中存在,否则不存在,继续匹配下一个模板,直到遍历完所有模板。

以模板‘加’为例,图像大小为40x260,模板大小27x27,result是一个大小为(14,234)的矩阵,即上文提到的相似度矩阵,矩阵中的数值属于[-1,1],找到result中最大值所处的对应位置即为与模板最匹配的图像位置:x=66,y=11,正好对应模板图像在image中所处的位置。 (更多内容请参阅参考文献【4】)

但这是比较好的情况,因为在匹配时遍历了所有的模板,而一张图片中出现的模板数量是有限的,比如数字’四’在图片中是没有的,这时就要根据某种规则去掉这些在图片中没有出现的模板:程序中使用dist变量来过滤匹配结果,如果dist变量大于某个值则认为此模板在图像中不存在。

最后的result_list中可能仍然存在一些图片中不存在的模板或者匹配不精确的模板,比如数字‘一’在模板中不存在,但仍然可以匹配到,因为数字‘二’中可以匹配到‘一’,需要进一步优化,优化方法有很多,比如当匹配到的两个模板距离过近时,选择较大的那个模板,其余方法留给读者自行考虑吧。

后续将会推出如何使用深度学习识别验证码,敬请期待~


参考文献:

http://www.cnblogs.com/beer/p/5672678.html

http://www.ruanyifeng.com/blog/2013/03/cosine_similarity.html

J. P. Lewis, “Fast Normalized Cross-Correlation”, Industrial Light and Magic.

http://scikit-image.org/docsjinhqin/dev/auto_examples/plot_template.html


本文作者 :李晖(点融黑帮),毕业于电子科技大学,现就职于点融成都Data部门,对一切新鲜事物充满好奇,对跳舞毫无抵抗力的活力女青年一枚。

图像处理基于模板匹配的验证码识别

识别思路:这个验证码比较规则,数字都是显示在固定的区域,数字也无粘连,实现步骤如下

  1. 图片去噪

  2. 对图像进行分割,分割成一个图像显示一个数字

  3. 对每个图像进行灰化处理,就是设置一个阈值将他们变成黑白图片

  4. 建立一个标准的数字图像库

  5. 将每个被分割的小图片与标准库比较,像素点重合最多的就是该数字



function varargout = MainForm(varargin)% MAINFORM MATLAB code for MainForm.fig% MAINFORM, by itself, creates a new MAINFORM or raises the existing% singleton*.%% H = MAINFORM returns the handle to a new MAINFORM or the handle to% the existing singleton*.%% MAINFORM('CALLBACK',hObject,eventData,handles,...) calls the local% function named CALLBACK in MAINFORM.M with the given input arguments.%% MAINFORM('Property','Value',...) creates a new MAINFORM or raises the% existing singleton*. Starting from the left, property value pairs are% applied to the GUI before MainForm_OpeningFcn gets called. An% unrecognized property name or invalid value makes property application% stop. All inputs are passed to MainForm_OpeningFcn via varargin.%% *See GUI Options on GUIDE's Tools menu. Choose "GUI allows only one% instance to run (singleton)".%% See also: GUIDE, GUIDATA, GUIHANDLES
% Edit the above text to modify the response to help MainForm
% Last Modified by GUIDE v2.5 28-Jun-2013 10:42:40
% Begin initialization code - DO NOT EDITgui_Singleton = 1;gui_State = struct('gui_Name', mfilename, ... 'gui_Singleton', gui_Singleton, ... 'gui_OpeningFcn', @MainForm_OpeningFcn, ... 'gui_OutputFcn', @MainForm_OutputFcn, ... 'gui_LayoutFcn', [] , ... 'gui_Callback', []);if nargin && ischar(varargin{1}) gui_State.gui_Callback = str2func(varargin{1});end
if nargout [varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});else gui_mainfcn(gui_State, varargin{:});end% End initialization code - DO NOT EDIT

% --- Executes just before MainForm is made visible.function MainForm_OpeningFcn(hObject, eventdata, handles, varargin)% This function has no output args, see OutputFcn.% hObject handle to figure% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)% varargin command line arguments to MainForm (see VARARGIN)
% Choose default command line output for MainFormhandles.output = hObject;clc;set(handles.axes1, 'Box', 'on', 'Color', 'c', 'XTickLabel', '', 'YTickLabel', '');set(handles.axes2, 'Box', 'on', 'Color', 'c', 'XTickLabel', '', 'YTickLabel', '');set(handles.text4, 'String', '');handles.fileurl = 0;handles.Img = 0;handles.Imgbw = 0;handles.Ti = 0;% Update handles structureguidata(hObject, handles);
% UIWAIT makes MainForm wait for user response (see UIRESUME)% uiwait(handles.figure1);

% --- Outputs from this function are returned to the command line.function varargout = MainForm_OutputFcn(hObject, eventdata, handles)% varargout cell array for returning output args (see VARARGOUT);% hObject handle to figure% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)
% Get default command line output from handles structurevarargout{1} = handles.output;
function varargout = pushbutton1_CreateFcn(hObject, eventdata, handles)
% --- Executes on button press in pushbutton1.function pushbutton1_Callback(hObject, eventdata, handles)% hObject handle to pushbutton1 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)% 载入图像file = fullfile(pwd, 'test/下载.jpg');[Filename, Pathname] = uigetfile({'*.jpg;*.tif;*.png;*.gif','All Image Files';... '*.*','All Files' }, '载入验证码图像',... file);if isequal(Filename, 0) || isequal(Pathname, 0) return;end% 显示图像axes(handles.axes1); cla reset;axes(handles.axes2); cla reset;set(handles.axes1, 'Box', 'on', 'Color', 'c', 'XTickLabel', '', 'YTickLabel', '');set(handles.axes2, 'Box', 'on', 'Color', 'c', 'XTickLabel', '', 'YTickLabel', '');set(handles.text4, 'String', '');% 存储fileurl = fullfile(Pathname,Filename);Img = imread(fileurl);imshow(Img, [], 'Parent', handles.axes1);set(handles.text2, 'String', '验证码图像');handles.fileurl = fileurl;handles.Img = Img;guidata(hObject, handles);

% --- Executes on button press in pushbutton2.function pushbutton2_Callback(hObject, eventdata, handles)% hObject handle to pushbutton2 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)if isequal(handles.Img, 0) return;endImg = handles.Img;% 颜色空间转换hsv = rgb2hsv(Img);h = hsv(:, :, 1);s = hsv(:, :, 2);v = hsv(:, :, 3);% 定位噪音点bw1 = h > 0.16 & h < 0.30;bw2 = s > 0.65 & s < 0.80;bw = bw1 & bw2;% 过滤噪音点Imgr = Img(:, :, 1);Imgg = Img(:, :, 2);Imgb = Img(:, :, 3);Imgr(bw) = 255;Imgg(bw) = 255;Imgb(bw) = 255;% 去噪结果Imgbw = cat(3, Imgr, Imgg, Imgb);imshow(Imgbw, [], 'Parent', handles.axes2);set(handles.text3, 'String', '验证码图像去噪');handles.Imgbw = Imgbw;guidata(hObject, handles);

% --- Executes on button press in pushbutton3.function pushbutton3_Callback(hObject, eventdata, handles)% hObject handle to pushbutton3 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)if isequal(handles.Imgbw, 0) return;endImgbw = handles.Imgbw;% 灰度化Ig = rgb2gray(Imgbw);% 二值化Ibw = im2bw(Ig, 0.8);
% 常量参数sz = size(Ibw);cs = sum(Ibw, 1);mincs = min(cs);maxcs = max(cs);masksize = 16;
% 初始化S1 = []; E1 = [];% 1对应开始,2对应结束flag = 1;s1 = 1;tol = maxcs;
while s1 < sz(2) for i = s1 : sz(2) % 移动游标 s2 = i; if cs(s2) < tol && flag == 1 % 达到起始位置 flag = 2; S1 = [S1 s2-1]; break; elseif cs(s2) >= tol && flag == 2 % 达到结束位置 flag = 1; E1 = [E1 s2]; break; end end s1 = s2 + 1;end% 图像反色Ibw = ~Ibw;% 图像细化Ibw = bwmorph(Ibw, 'thin', inf);for i = 1 : length(S1) % 图像裁剪 Ibwi = Ibw(:, S1(i):E1(i)); % 面积滤波 [L, num] = bwlabel(Ibwi); stats = regionprops(L); Ar = cat(1, stats.Area); [maxAr, ind_maxAr] = max(Ar); recti = stats(ind_maxAr).BoundingBox; recti(1) = recti(1) + S1(i) - 1; recti(2) = recti(2); recti(3) = recti(3); recti(4) = recti(4); Rect{i} = recti; % 图像裁剪 Ibwi = imcrop(Ibw, recti); rate = masksize/max(size(Ibwi)); Ibwi = imresize(Ibwi, rate, 'bilinear'); ti = zeros(masksize, masksize); rsti = round((size(ti, 1)-size(Ibwi, 1))/2); csti = round((size(ti, 2)-size(Ibwi, 2))/2); ti(rsti+1:rsti+size(Ibwi,1), csti+1:csti+size(Ibwi,2))=Ibwi; % 存储 Ti{i} = ti;endimshow(Ibw, [], 'Parent', handles.axes2); hold on;for i = 1 : length(Rect) rectangle('Position', Rect{i}, 'EdgeColor', 'r', 'LineWidth', 2);endhold off;set(handles.text3, 'String', '验证码数字定位');handles.Ti = Ti;guidata(hObject, handles);

% --- Executes on button press in pushbutton4.function pushbutton4_Callback(hObject, eventdata, handles)% hObject handle to pushbutton4 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)if isequal(handles.Ti, 0) return;end% 加入红色边框Ti = handles.Ti;It = [];spcr = ones(size(Ti{1}, 1), 3)*255;spcg = ones(size(Ti{1}, 1), 3)*0;spcb = ones(size(Ti{1}, 1), 3)*0;spc = cat(3, spcr, spcg, spcb);% 整合到一起It = [It spc];for i = 1 : length(Ti) ti = Ti{i}; ti = cat(3, ti, ti, ti); ti = im2uint8(mat2gray(ti)); It = [It ti spc];endimshow(It, [], 'Parent', handles.axes2); hold on;set(handles.text3, 'String', '验证码归一化');

% --- Executes on button press in pushbutton5.function pushbutton5_Callback(hObject, eventdata, handles)% hObject handle to pushbutton5 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)if isequal(handles.Ti, 0) return;endTi = handles.Ti;% 比对识别fileList = GetAllFiles(fullfile(pwd, 'Databse'));Tj = [];for i = 1 : length(fileList) filenamei = fileList{i}; [pathstr, name, ext] = fileparts(filenamei); if isequal(ext, '.jpg') ti = imread(filenamei); ti = im2bw(ti, 0.5); ti = double(ti); % 提取不变矩特征数据 phii = invmoments(ti); % 开始比对 OTj = []; for j = 1 : length(Ti) tij = double(Ti{j}); phij = invmoments(tij); ad = norm(phii-phij); otij.filename = filenamei; otij.ad = ad; OTj = [OTj otij]; end Tj = [Tj; OTj]; endend% 生成结果r = [];for i = 1 : size(Tj, 2) ti = Tj(:, i); adi = cat(1, ti.ad); [minadi, ind] = min(adi); filenamei = ti(ind).filename; [pathstr, name, ext] = fileparts(filenamei); name = name(1); r = [r name];endset(handles.text4, 'String', r);

% --- Executes on button press in pushbutton6.function pushbutton6_Callback(hObject, eventdata, handles)% hObject handle to pushbutton6 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)% 弹出对话框输入正确数值choice = questdlg('确定要更新验证码样本库?(请在识别有错误的情况进行更新!)', ... '退出', ... '确定','取消','取消');switch choice case '确定' prompt={'请输入正确的验证码:'}; name='手动入库'; numlines=1; defaultanswer={''}; answer=inputdlg(prompt,name,numlines,defaultanswer); if isempty(answer) return; end if isequal(handles.fileurl, 0) return; end % 入库 fileurl = handles.fileurl; answer = answer{1}; fileout = fullfile(pwd, sprintf('images/%s.jpg', answer)); flag = 1; while 1 if exist(fileout, 'file') fileout = fullfile(pwd, sprintf('images/%s_%d.jpg', answer, flag)); flag = flag + 1; else copyfile(fileurl,fileout); msgbox(sprintf('已入库,请重新生成数据库!路径为%s', fileout), '提示信息', 'modal'); break; end end case '取消' return;end
% --- Executes on button press in pushbutton7.function pushbutton7_Callback(hObject, eventdata, handles)% hObject handle to pushbutton7 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)% 退出菜单choice = questdlg('确定要退出系统?', ... '退出', ... '确定','取消','取消');switch choice case '确定' close; case '取消' return;end

% --- Executes on button press in pushbutton8.function pushbutton8_Callback(hObject, eventdata, handles)% hObject handle to pushbutton8 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)GetDatabase();


function edit1_Callback(hObject, eventdata, handles)% hObject handle to edit1 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)
% Hints: get(hObject,'String') returns contents of edit1 as text% str2double(get(hObject,'String')) returns contents of edit1 as a double

% --- Executes during object creation, after setting all properties.function edit1_CreateFcn(hObject, eventdata, handles)% hObject handle to edit1 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles empty - handles not created until after all CreateFcns called
% Hint: edit controls usually have a white background on Windows.% See ISPC and COMPUTER.if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) set(hObject,'BackgroundColor','white');end

% --- Executes on button press in pushbutton9.function pushbutton9_Callback(hObject, eventdata, handles)% hObject handle to pushbutton9 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)[FileName,PathName] = uiputfile({'*.jpg;*.tif;*.png;*.gif','All Image Files';... '*.*','All Files' },'Save Image',... fullfile(pwd, 'screen.jpg'));if isequal(FileName, 0) || isequal(PathName, 0) return;endf = getframe(gcf);f = frame2im(f);imwrite(f, fullfile(PathName, FileName));msgbox('保存成功!', '提示信息', 'modal');

往期回顾>>>>>>

分享到朋友圈集赞10个即可获取源码

以上是关于验证码识别之模板匹配方法的主要内容,如果未能解决你的问题,请参考以下文章

OpenCV - 滑动拼图验证码自动识别与匹配

图像处理基于模板匹配的验证码识别

Python+scrapy爬虫:手绘验证码识别

口令破解之验证码识别技术探究

图片识别之验证码识别

python之验证码截取与验证码识别