Selenium实战滑动验证码破解JAVA爬虫
Posted 洛阳泰山
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Selenium实战滑动验证码破解JAVA爬虫相关的知识,希望对你有一定的参考价值。
简介
本文主要讲解,利用之前所学到的java selenium如何实战操作,浏览器控制鼠标,模拟人工操作滑动验证码。这里需要用javacv 的代码知识,用于计算图像中滑块需要移动的距离。
实战
目标网站:
https://dun.163.com/trial/jigsaw
操作流程:
- 找到目标
这里我们需要使用selenium库来模拟操作浏览器,可以通过pip install selenium安装它,同时需要下载对应浏览器版本的webdriver来辅助控制电脑,这里就不详细讲解了,直接看操作流程:
import cn.hutool.http.HttpUtil;
import lombok.extern.slf4j.Slf4j;
import org.bytedeco.javacpp.DoublePointer;
import org.bytedeco.opencv.global.opencv_core;
import org.bytedeco.opencv.global.opencv_imgcodecs;
import org.bytedeco.opencv.global.opencv_imgproc;
import org.bytedeco.opencv.opencv_core.Mat;
import org.bytedeco.opencv.opencv_core.Point;
import org.bytedeco.opencv.opencv_core.Scalar;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.interactions.Actions;
import java.io.*;
import java.util.concurrent.TimeUnit;
@Slf4j
public class SwipeCaptcha2
private final static String webDriver = "webdriver.chrome.driver";
private final static String webDriverPath ="E:\\\\chromedriver\\\\chromedriver.exe";
public static void main(String[] args) throws IOException, InterruptedException
System.setProperty(webDriver, webDriverPath);
WebDriver driver= new ChromeDriver();
driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);
driver.get("https://dun.163.com/trial/jigsaw");
//浏览器最大化
driver.manage().window().maximize();
//滑块按钮
WebElement sliderBtn=driver.findElement(By.className("yidun_slider"));
//点击触发滑块拼图
sliderBtn.click();
Thread.sleep(500);
通过上面的流程找到验证码的位置,开始获取验证码图片
- 下载验证码图片
从网页中,我们可以看出这是两张图片:
现在需要把这两张图片下载到本地文件
//获取两张图片的下载链接
WebElement bgImg=driver.findElement(By.className("yidun_bg-img"));
WebElement sliderImg=driver.findElement(By.className("yidun_jigsaw"));
String bgImgSrc=bgImg.getAttribute("src");
String sliderImgSrc=sliderImg.getAttribute("src");
Thread.sleep(500);
//下载操作
String bgImage="E:\\\\screenshot\\\\bj.jpg";
String sliderImage="E:\\\\screenshot\\\\slider.png";
HttpUtil.downloadFile(bgImgSrc,bgImage);
HttpUtil.downloadFile(sliderImgSrc,sliderImage);
- 验证码图像识别
3.1、图像转黑白图片
//1.从本地读取背景原图,灰度处理
Mat sliderMat = opencv_imgcodecs.imread(sliderImage, opencv_imgcodecs.IMREAD_GRAYSCALE);
Mat bgMat = opencv_imgcodecs.imread(bgImage , opencv_imgcodecs.IMREAD_GRAYSCALE);
//2.二值化转黑白图
opencv_imgproc.threshold(sliderMat,sliderMat,127,255, opencv_imgproc.THRESH_BINARY);
opencv_imgproc.threshold(bgMat,bgMat,127,255, opencv_imgproc.THRESH_BINARY);
//保存为黑白图片
opencv_imgcodecs.imwrite("E:\\\\screenshot\\\\slider_black.png",sliderMat);
opencv_imgcodecs.imwrite("E:\\\\screenshot\\\\bg_black.jpg",bgMat);
转成只有黑和白的像素图片,方便javacv识别匹配位置。
3.2 识别图像位置
//匹配小图在大图中的位置 用标准模式去比较 然后把返回结果给result
opencv_imgproc.matchTemplate(sliderMat, bgMat, result, opencv_imgproc.TM_CCORR_NORMED);
opencv_core.normalize(result, result, 0, 1, opencv_core.NORM_MINMAX, -1, new Mat());
DoublePointer pointer = new DoublePointer(new double[2]);
org.bytedeco.opencv.opencv_core.Point maxLoc = new org.bytedeco.opencv.opencv_core.Point();
//获取匹配结果坐标
opencv_core.minMaxLoc(result, null, pointer, null, maxLoc, null);
//在图上做标记
opencv_imgproc.rectangle(sliderMat, maxLoc,
new Point(maxLoc.x() + bgMat.cols(), maxLoc.y() + bgMat.rows()),
new Scalar(0, 255, 0,1));
System.out.println(maxLoc.x()+","+maxLoc.y()+" x-y="+(maxLoc.x()-maxLoc.y()));
4.拖动滑块
4.1、控制滑块滑动
Actions actions = new Actions(driver);
actions.clickAndHold(sliderBtn).perform();
actions.moveByOffset(moveDist, 0).perform();
Thread.sleep(200);
actions.release().perform();
4.2、验证是否滑动成功
当验证通过后,滑动底的部会出现“对勾”,如图:
可以通过检查这个“对勾”判断是否验证成功,如果第一次验证失败,可以继续验证,直到成功为止!
isExistElement(driver,By.xpath("//div[contains(@class,'yidun--success')]"))
/**
* 判断某个元素是否存在
*/
public static boolean isExistElement(WebDriver webDriver, By by)
try
webDriver.findElement(by);
return true;
catch (Exception e)
return false;
代码汇总:
import cn.hutool.http.HttpUtil;
import lombok.extern.slf4j.Slf4j;
import org.bytedeco.javacpp.DoublePointer;
import org.bytedeco.opencv.global.opencv_core;
import org.bytedeco.opencv.global.opencv_imgcodecs;
import org.bytedeco.opencv.global.opencv_imgproc;
import org.bytedeco.opencv.opencv_core.Mat;
import org.bytedeco.opencv.opencv_core.Point;
import org.bytedeco.opencv.opencv_core.Scalar;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.interactions.Actions;
import java.io.*;
import java.util.concurrent.TimeUnit;
@Slf4j
public class SwipeCaptcha2
private final static String webDriver = "webdriver.chrome.driver";
private final static String webDriverPath ="E:\\\\chromedriver\\\\chromedriver.exe";
public static void main(String[] args) throws IOException, InterruptedException
System.setProperty(webDriver, webDriverPath);
WebDriver driver= new ChromeDriver();
driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);
driver.get("https://dun.163.com/trial/jigsaw");
//浏览器最大化
driver.manage().window().maximize();
//滑块按钮
WebElement sliderBtn=driver.findElement(By.className("yidun_slider"));
//点击触发滑块拼图
sliderBtn.click();
Thread.sleep(500);
Actions actions = new Actions(driver);
String bgImage="E:\\\\screenshot\\\\bj.jpg";
String sliderImage="E:\\\\screenshot\\\\slider.png";
for (int i = 0; i < 10; i++)
WebElement bgImg=driver.findElement(By.className("yidun_bg-img"));
WebElement sliderImg=driver.findElement(By.className("yidun_jigsaw"));
String bgImgSrc=bgImg.getAttribute("src");
String sliderImgSrc=sliderImg.getAttribute("src");
Thread.sleep(500);
//下载操作
HttpUtil.downloadFile(bgImgSrc,bgImage);
HttpUtil.downloadFile(sliderImgSrc,sliderImage);
double slideDistance=getMoveDist(bgImage,sliderImage);
//修正误差
slideDistance = slideDistance +10;
actions.clickAndHold(sliderBtn).perform();
//循环一点点的移动
for(int j=moveDist;j>=1;)
int move=1;
if(j>1)
move= RandomUtil.randomInt(1,j);
actions.moveByOffset(move, 0).perform();
else
actions.moveByOffset(move, 0).perform();
j=j-move;
Thread.sleep(200);
actions.release().perform();
Thread.sleep(1000);
if(isExistElement(driver,By.xpath("//div[contains(@class,'yidun--success')]")))
break;
/**
* 计算滑块移动距离(经过反复测试只有在颜色比较深的时候,才能计算转圈)
*
*/
private static double getMoveDist(String bgImage,String sliderImage)
//1.从本地读取背景原图,灰度处理
Mat sliderMat = opencv_imgcodecs.imread(sliderImage, opencv_imgcodecs.IMREAD_GRAYSCALE);
Mat bgMat = opencv_imgcodecs.imread(bgImage , opencv_imgcodecs.IMREAD_GRAYSCALE);
//2.二值化转黑白图
opencv_imgproc.threshold(sliderMat,sliderMat,127,255, opencv_imgproc.THRESH_BINARY);
opencv_imgproc.threshold(bgMat,bgMat,127,255, opencv_imgproc.THRESH_BINARY);
//保存为黑白图片
//opencv_imgcodecs.imwrite("E:\\\\screenshot\\\\slider_black.png",sliderMat);
//opencv_imgcodecs.imwrite("E:\\\\screenshot\\\\bg_black.jpg",bgMat);
Mat result = new Mat();
//3.匹配小图在大图中的位置 用标准模式去比较 然后把返回结果给result
opencv_imgproc.matchTemplate(sliderMat, bgMat, result, opencv_imgproc.TM_CCORR_NORMED);
opencv_core.normalize(result, result, 0, 1, opencv_core.NORM_MINMAX, -1, new Mat());
DoublePointer pointer = new DoublePointer(new double[2]);
org.bytedeco.opencv.opencv_core.Point maxLoc = new org.bytedeco.opencv.opencv_core.Point();
//4.获取匹配结果坐标
opencv_core.minMaxLoc(result, null, pointer, null, maxLoc, null);
//5.在图上做标记
opencv_imgproc.rectangle(sliderMat, maxLoc,
new Point(maxLoc.x() + bgMat.cols(), maxLoc.y() + bgMat.rows()),
new Scalar(0, 255, 0,1));
System.out.println("二维中坐标的位置:"+maxLoc.x()+","+maxLoc.y());
return maxLoc.x();
/**
* 判断某个元素是否存在
*/
public static boolean isExistElement(WebDriver webDriver, By by)
try
webDriver.findElement(by);
return true;
catch (Exception e)
return false;
未经本人同意,文章禁止转载。
以上是关于Selenium实战滑动验证码破解JAVA爬虫的主要内容,如果未能解决你的问题,请参考以下文章
常见的一些反爬虫策略破解方式-Java网络爬虫系统性学习与实战系列(11)