java爬虫知识盲区整理

Posted 大忽悠爱忽悠

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了java爬虫知识盲区整理相关的知识,希望对你有一定的参考价值。


HttpClient重定向处理

【HttpClient4.5中文教程】八.终止请求和重定向处理

首先说说HttpClient和浏览器的区别

我们从浏览器发起一笔请求,浏览器则会帮你处理重定向、缓存等事情。这也就是为什么用浏览器表单post提交后,不管服务端如何重定向,都能正常接收到服务端返回的数据。

但是用HttpClient呢,你会发现,请求后,会返回302,因为POST方式提交HttpClient是不会帮你处理重定向的。这时候怎么办呢?

方法一:(自己手动处理)

HttpClient httpClient = HttpClients.createDefault();

        HttpPost httpPost= new HttpPost(http://ip:port/xxx);

        CloseableHttpResponse response = httpclient.execute(httpPost);

        int statusCode = response.getStatusLine().getStatusCode();
        System.out.println("statusCode=="+statusCode); //返回码

        Header header=response.getFirstHeader("Location");

        //重定向地址
        String location =  header.getValue();
        System.out.println(location);

        //然后再对新的location发起请求即可

        HttpGet httpGet = new HttpGet(location);
        CloseableHttpResponse response2 = httpclient.execute(httpGet);
        System.out.println("返回报文"+EntityUtils.toString(response2.getEntity(), "UT-F-8"));

方法二:(已有工具类)

HttpClientBuilder builder = HttpClients.custom()
            .disableAutomaticRetries() //关闭自动处理重定向
            .setRedirectStrategy(new LaxRedirectStrategy());//利用LaxRedirectStrategy处理POST重定向问题

       CloseableHttpClient client = builder.build();

        HttpPost httpPost= new HttpPost(http://ip:port/xxx);

        CloseableHttpResponse response = client.execute(httpPost);

        int statusCode = response.getStatusLine().getStatusCode();
        System.out.println("statusCode=="+statusCode); //返回码

         System.out.println("返回报文"+EntityUtils.toString(response.getEntity(), "UT-F-8"));

HttpClient获取Cookie的两种方式

一、旧版本的HttpClient获取Cookies

p.s. 该方式官方已不推荐使用

使用DefaultHttpClient类实例化httpClient对象:

public static String dooPost_deprecated(String url, Map<String, String> map, String charset) 
        DefaultHttpClient httpClient = null;
        HttpPost httpPost = null;
        String result = null;
        try 
            httpClient = new DefaultHttpClient();
            httpPost = new HttpPost(url);
            // 设置参数
            List<NameValuePair> list = new ArrayList<NameValuePair>();
            Iterator<Entry<String, String>> iterator = map.entrySet().iterator();
            while (iterator.hasNext()) 
                Entry<String, String> elem = (Entry<String, String>) iterator.next();
                list.add(new BasicNameValuePair(elem.getKey(), elem.getValue()));
            
            if (list.size() > 0) 
                UrlEncodedFormEntity entity = new UrlEncodedFormEntity(list, charset);
                httpPost.setEntity(entity);
            
            HttpResponse response = httpClient.execute(httpPost);
            System.out.println(response.getStatusLine().getStatusCode());
            String JSESSIONID = null;
            String cookie_user = null;
            //获得Cookies
            CookieStore cookieStore = httpClient.getCookieStore();
            List<Cookie> cookies = cookieStore.getCookies();
            for (int i = 0; i < cookies.size(); i++) 
                //遍历Cookies
                System.out.println(cookies.get(i));
                System.out.println("cookiename=="+cookies.get(i).getName());
                System.out.println("cookieValue=="+cookies.get(i).getValue());
                System.out.println("Domain=="+cookies.get(i).getDomain());
                System.out.println("Path=="+cookies.get(i).getPath());
                System.out.println("Version=="+cookies.get(i).getVersion());

                if (cookies.get(i).getName().equals("JSESSIONID")) 
                    JSESSIONID = cookies.get(i).getValue();
                
                if (cookies.get(i).getName().equals("cookie_user")) 
                    cookie_user = cookies.get(i).getValue();
                
            
            if (cookie_user != null) 
                result = JSESSIONID;
            
         catch (Exception ex) 
            ex.printStackTrace();
        
        return result;
    

二、新版本的HttpClient获取Cookies

使用CloseableHttpClient类实例化httpClient对象:

public static String doPost(Map<String, String> map, String charset) 
        CloseableHttpClient httpClient = null;
        HttpPost httpPost = null;
        String result = null;
        try 
            CookieStore cookieStore = new BasicCookieStore();
            httpClient = HttpClients.custom().setDefaultCookieStore(cookieStore).build();
            httpPost = new HttpPost("http://localhost:8080/testtoolmanagement/LoginServlet");
            List<NameValuePair> list = new ArrayList<NameValuePair>();
            Iterator<Map.Entry<String, String>> iterator = map.entrySet().iterator();
            while (iterator.hasNext()) 
                Entry<String, String> elem = (Entry<String, String>) iterator.next();
                list.add(new BasicNameValuePair(elem.getKey(), elem.getValue()));
            
            if (list.size() > 0) 
                UrlEncodedFormEntity entity = new UrlEncodedFormEntity(list, charset);
                httpPost.setEntity(entity);
            
            httpClient.execute(httpPost);
            String JSESSIONID = null;
            String cookie_user = null;
            List<Cookie> cookies = cookieStore.getCookies();
            for (int i = 0; i < cookies.size(); i++) 
                if (cookies.get(i).getName().equals("JSESSIONID")) 
                    JSESSIONID = cookies.get(i).getValue();
                
                if (cookies.get(i).getName().equals("cookie_user")) 
                    cookie_user = cookies.get(i).getValue();
                
            
            if (cookie_user != null) 
                result = JSESSIONID;
            
         catch (Exception ex) 
            ex.printStackTrace();
        
        return result;
    

以上是关于java爬虫知识盲区整理的主要内容,如果未能解决你的问题,请参考以下文章

JAVA知识盲区整理

Spring源码研读中的知识盲区整理

项目开发知识盲区整理2

Mysql知识盲区整理

项目知识盲区整理4

服务器知识盲区整理