Selenium实战滑动验证码破解JAVA爬虫

Posted 洛阳泰山

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Selenium实战滑动验证码破解JAVA爬虫相关的知识,希望对你有一定的参考价值。

简介

本文主要讲解,利用之前所学到的java selenium如何实战操作,浏览器控制鼠标,模拟人工操作滑动验证码。这里需要用javacv 的代码知识,用于计算图像中滑块需要移动的距离。

实战

目标网站:

https://dun.163.com/trial/jigsaw

操作流程:

    • 找到目标

  • 这里我们需要使用selenium库来模拟操作浏览器,可以通过pip install selenium安装它,同时需要下载对应浏览器版本的webdriver来辅助控制电脑,这里就不详细讲解了,直接看操作流程:


import cn.hutool.http.HttpUtil;
import lombok.extern.slf4j.Slf4j;
import org.bytedeco.javacpp.DoublePointer;
import org.bytedeco.opencv.global.opencv_core;
import org.bytedeco.opencv.global.opencv_imgcodecs;
import org.bytedeco.opencv.global.opencv_imgproc;
import org.bytedeco.opencv.opencv_core.Mat;
import org.bytedeco.opencv.opencv_core.Point;
import org.bytedeco.opencv.opencv_core.Scalar;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.interactions.Actions;

import java.io.*;
import java.util.concurrent.TimeUnit;

@Slf4j
public class SwipeCaptcha2 

    private final static String webDriver = "webdriver.chrome.driver";
    private final static String webDriverPath ="E:\\\\chromedriver\\\\chromedriver.exe";

    public static void main(String[] args) throws IOException, InterruptedException 
        System.setProperty(webDriver, webDriverPath);
        WebDriver driver= new ChromeDriver();
        driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);
        driver.get("https://dun.163.com/trial/jigsaw");
        //浏览器最大化
        driver.manage().window().maximize();
        //滑块按钮
        WebElement sliderBtn=driver.findElement(By.className("yidun_slider"));
        //点击触发滑块拼图
        sliderBtn.click();
        Thread.sleep(500);
     
  • 通过上面的流程找到验证码的位置,开始获取验证码图片

    • 下载验证码图片

从网页中,我们可以看出这是两张图片:

  • 现在需要把这两张图片下载到本地文件

           
           //获取两张图片的下载链接
            WebElement bgImg=driver.findElement(By.className("yidun_bg-img"));
            WebElement sliderImg=driver.findElement(By.className("yidun_jigsaw"));
            String bgImgSrc=bgImg.getAttribute("src");
            String sliderImgSrc=sliderImg.getAttribute("src");
            Thread.sleep(500);

            //下载操作
            String bgImage="E:\\\\screenshot\\\\bj.jpg";
            String sliderImage="E:\\\\screenshot\\\\slider.png";
            HttpUtil.downloadFile(bgImgSrc,bgImage);
            HttpUtil.downloadFile(sliderImgSrc,sliderImage);
    • 验证码图像识别

3.1、图像转黑白图片

        //1.从本地读取背景原图,灰度处理
        Mat sliderMat = opencv_imgcodecs.imread(sliderImage, opencv_imgcodecs.IMREAD_GRAYSCALE);
        Mat bgMat = opencv_imgcodecs.imread(bgImage , opencv_imgcodecs.IMREAD_GRAYSCALE);
        //2.二值化转黑白图
        opencv_imgproc.threshold(sliderMat,sliderMat,127,255, opencv_imgproc.THRESH_BINARY);
        opencv_imgproc.threshold(bgMat,bgMat,127,255, opencv_imgproc.THRESH_BINARY);
        //保存为黑白图片
        opencv_imgcodecs.imwrite("E:\\\\screenshot\\\\slider_black.png",sliderMat);
        opencv_imgcodecs.imwrite("E:\\\\screenshot\\\\bg_black.jpg",bgMat);
  • 转成只有黑和白的像素图片,方便javacv识别匹配位置。

3.2 识别图像位置

     //匹配小图在大图中的位置  用标准模式去比较 然后把返回结果给result
        opencv_imgproc.matchTemplate(sliderMat, bgMat, result, opencv_imgproc.TM_CCORR_NORMED);
        opencv_core.normalize(result, result, 0, 1, opencv_core.NORM_MINMAX, -1, new Mat());
        DoublePointer pointer = new DoublePointer(new double[2]);
        org.bytedeco.opencv.opencv_core.Point maxLoc = new org.bytedeco.opencv.opencv_core.Point();
        //获取匹配结果坐标
        opencv_core.minMaxLoc(result, null, pointer, null, maxLoc, null);
        //在图上做标记
        opencv_imgproc.rectangle(sliderMat, maxLoc,
                new Point(maxLoc.x() + bgMat.cols(), maxLoc.y() + bgMat.rows()),
                new Scalar(0, 255, 0,1));
        System.out.println(maxLoc.x()+","+maxLoc.y()+"  x-y="+(maxLoc.x()-maxLoc.y()));

4.拖动滑块

4.1、控制滑块滑动

Actions actions = new Actions(driver);
actions.clickAndHold(sliderBtn).perform();
actions.moveByOffset(moveDist, 0).perform();
Thread.sleep(200);
actions.release().perform();

4.2、验证是否滑动成功

  • 当验证通过后,滑动底的部会出现“对勾”,如图:

  • 可以通过检查这个“对勾”判断是否验证成功,如果第一次验证失败,可以继续验证,直到成功为止!

isExistElement(driver,By.xpath("//div[contains(@class,'yidun--success')]"))
/**
     * 判断某个元素是否存在
     */
    public static boolean isExistElement(WebDriver webDriver, By by) 
        try 
            webDriver.findElement(by);
            return true;
         catch (Exception e) 
            return false;
        
    

代码汇总:


import cn.hutool.http.HttpUtil;
import lombok.extern.slf4j.Slf4j;
import org.bytedeco.javacpp.DoublePointer;
import org.bytedeco.opencv.global.opencv_core;
import org.bytedeco.opencv.global.opencv_imgcodecs;
import org.bytedeco.opencv.global.opencv_imgproc;
import org.bytedeco.opencv.opencv_core.Mat;
import org.bytedeco.opencv.opencv_core.Point;
import org.bytedeco.opencv.opencv_core.Scalar;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.interactions.Actions;

import java.io.*;
import java.util.concurrent.TimeUnit;

@Slf4j
public class SwipeCaptcha2 

    private final static String webDriver = "webdriver.chrome.driver";
    private final static String webDriverPath ="E:\\\\chromedriver\\\\chromedriver.exe";

    public static void main(String[] args) throws IOException, InterruptedException 
        System.setProperty(webDriver, webDriverPath);
        WebDriver driver= new ChromeDriver();
        driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);
        driver.get("https://dun.163.com/trial/jigsaw");
        //浏览器最大化
        driver.manage().window().maximize();
        //滑块按钮
        WebElement sliderBtn=driver.findElement(By.className("yidun_slider"));
        //点击触发滑块拼图
        sliderBtn.click();
        Thread.sleep(500);
        Actions actions = new Actions(driver);
        String bgImage="E:\\\\screenshot\\\\bj.jpg";
        String sliderImage="E:\\\\screenshot\\\\slider.png";
        for (int i = 0; i < 10; i++) 
            WebElement bgImg=driver.findElement(By.className("yidun_bg-img"));
            WebElement sliderImg=driver.findElement(By.className("yidun_jigsaw"));
            String bgImgSrc=bgImg.getAttribute("src");
            String sliderImgSrc=sliderImg.getAttribute("src");
            Thread.sleep(500);
            //下载操作
            HttpUtil.downloadFile(bgImgSrc,bgImage);
            HttpUtil.downloadFile(sliderImgSrc,sliderImage);
            double slideDistance=getMoveDist(bgImage,sliderImage);
            //修正误差
            slideDistance = slideDistance +10;
            actions.clickAndHold(sliderBtn).perform();
            //循环一点点的移动
            for(int j=moveDist;j>=1;) 
                int move=1;
                if(j>1)
                    move= RandomUtil.randomInt(1,j);
                    actions.moveByOffset(move, 0).perform();
                else 
                    actions.moveByOffset(move, 0).perform();
                
                j=j-move;
            
            Thread.sleep(200);
            actions.release().perform();
            Thread.sleep(1000);
            if(isExistElement(driver,By.xpath("//div[contains(@class,'yidun--success')]")))
                break;
            
        
    

    /**
     * 计算滑块移动距离(经过反复测试只有在颜色比较深的时候,才能计算转圈)
     *
     */
    private static double getMoveDist(String bgImage,String sliderImage)
        //1.从本地读取背景原图,灰度处理
        Mat sliderMat = opencv_imgcodecs.imread(sliderImage, opencv_imgcodecs.IMREAD_GRAYSCALE);
        Mat bgMat = opencv_imgcodecs.imread(bgImage , opencv_imgcodecs.IMREAD_GRAYSCALE);
        //2.二值化转黑白图
        opencv_imgproc.threshold(sliderMat,sliderMat,127,255, opencv_imgproc.THRESH_BINARY);
        opencv_imgproc.threshold(bgMat,bgMat,127,255, opencv_imgproc.THRESH_BINARY);
        //保存为黑白图片
        //opencv_imgcodecs.imwrite("E:\\\\screenshot\\\\slider_black.png",sliderMat);
        //opencv_imgcodecs.imwrite("E:\\\\screenshot\\\\bg_black.jpg",bgMat);
        Mat result = new Mat();
        //3.匹配小图在大图中的位置  用标准模式去比较 然后把返回结果给result
        opencv_imgproc.matchTemplate(sliderMat, bgMat, result, opencv_imgproc.TM_CCORR_NORMED);
        opencv_core.normalize(result, result, 0, 1, opencv_core.NORM_MINMAX, -1, new Mat());
        DoublePointer pointer = new DoublePointer(new double[2]);
        org.bytedeco.opencv.opencv_core.Point maxLoc = new org.bytedeco.opencv.opencv_core.Point();
        //4.获取匹配结果坐标
        opencv_core.minMaxLoc(result, null, pointer, null, maxLoc, null);
        //5.在图上做标记
        opencv_imgproc.rectangle(sliderMat, maxLoc,
                new Point(maxLoc.x() + bgMat.cols(), maxLoc.y() + bgMat.rows()),
                new Scalar(0, 255, 0,1));
        System.out.println("二维中坐标的位置:"+maxLoc.x()+","+maxLoc.y());
        return maxLoc.x();
    

    /**
     * 判断某个元素是否存在
     */
    public static boolean isExistElement(WebDriver webDriver, By by) 
        try 
            webDriver.findElement(by);
            return true;
         catch (Exception e) 
            return false;
        
    

  • 未经本人同意,文章禁止转载。

以上是关于Selenium实战滑动验证码破解JAVA爬虫的主要内容,如果未能解决你的问题,请参考以下文章

常见的一些反爬虫策略破解方式-Java网络爬虫系统性学习与实战系列(11)

常见的一些反爬虫策略破解方式-Java网络爬虫系统性学习与实战系列(11)

selenium模拟破解京东滑块验证码

用Python爬虫:如何破解滑动验证码

Python大佬手把手带你用爬虫破解——滑动验证码识别

作为一只Python爬虫:如何破解滑动验证码