爬虫中的连接池

Posted 2020-12-10 juddy

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了爬虫中的连接池相关的知识，希望对你有一定的参考价值。

在前面的内容中已经可以深刻的体会到，不管是post请求还是get请求，每次都要创建HttpClient,会出现频繁的创建和销毁问题。

对于上面的问题我们可以使用连接池来解决

具体代码：

package cn.itcast.crawler.test;

import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import org.apache.http.util.EntityUtils;

import java.io.IOException;

public class HttpClientPoolTest {
    public static void main(String[] args) {
        //创建连接池管理器
        PoolingHttpClientConnectionManager cm=new PoolingHttpClientConnectionManager();
        //设置最大连接数
        cm.setMaxTotal(100);
        //设置每个主机的最大连接数
        cm.setDefaultMaxPerRoute(10);
        //使用连接池管理器发起请求
        doGet(cm);
        doGet(cm);
    }

    private static void doGet(PoolingHttpClientConnectionManager cm) {
        //这样操作以后及不是每次都创建新的HttpClient,而是从连接池中获取
       CloseableHttpClient httpClient= HttpClients.custom() .setConnectionManager(cm).build();
        HttpGet httpGet=new HttpGet("http://www.itcast.cn");
        CloseableHttpResponse response=null;
        try {
            response=httpClient.execute(httpGet);
            if(response.getStatusLine().getStatusCode()==200){
                String content=EntityUtils.toString(response.getEntity(),"utf8");
                System.out.println(content.length());
            }

        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            if(response!=null){
                try {
                    response.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
                //不能关闭httpClient，因为它由连接池进行管理了
                //httpClient.close();
            }

        }
    }
}
对于这个连接池是否有作用，可以通过打断点来查看

以上是关于爬虫中的连接池的主要内容，如果未能解决你的问题，请参考以下文章

Python3爬虫Scrapy使用IP代理池和随机User-Agent

C# SQLConnection 池

爬虫搭建动态代理池

scrapy按顺序启动多个爬虫代码片段(python3)

jedis连接redis

爬虫