java爬虫知识盲区整理
Posted 大忽悠爱忽悠
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了java爬虫知识盲区整理相关的知识,希望对你有一定的参考价值。
java爬虫知识盲区整理
HttpClient重定向处理
【HttpClient4.5中文教程】八.终止请求和重定向处理
首先说说HttpClient和浏览器的区别
我们从浏览器发起一笔请求,浏览器则会帮你处理重定向、缓存等事情。这也就是为什么用浏览器表单post提交后,不管服务端如何重定向,都能正常接收到服务端返回的数据。
但是用HttpClient呢,你会发现,请求后,会返回302,因为POST方式提交HttpClient是不会帮你处理重定向的。这时候怎么办呢?
方法一:(自己手动处理)
HttpClient httpClient = HttpClients.createDefault();
HttpPost httpPost= new HttpPost(http://ip:port/xxx);
CloseableHttpResponse response = httpclient.execute(httpPost);
int statusCode = response.getStatusLine().getStatusCode();
System.out.println("statusCode=="+statusCode); //返回码
Header header=response.getFirstHeader("Location");
//重定向地址
String location = header.getValue();
System.out.println(location);
//然后再对新的location发起请求即可
HttpGet httpGet = new HttpGet(location);
CloseableHttpResponse response2 = httpclient.execute(httpGet);
System.out.println("返回报文"+EntityUtils.toString(response2.getEntity(), "UT-F-8"));
方法二:(已有工具类)
HttpClientBuilder builder = HttpClients.custom()
.disableAutomaticRetries() //关闭自动处理重定向
.setRedirectStrategy(new LaxRedirectStrategy());//利用LaxRedirectStrategy处理POST重定向问题
CloseableHttpClient client = builder.build();
HttpPost httpPost= new HttpPost(http://ip:port/xxx);
CloseableHttpResponse response = client.execute(httpPost);
int statusCode = response.getStatusLine().getStatusCode();
System.out.println("statusCode=="+statusCode); //返回码
System.out.println("返回报文"+EntityUtils.toString(response.getEntity(), "UT-F-8"));
HttpClient获取Cookie的两种方式
一、旧版本的HttpClient获取Cookies
p.s. 该方式官方已不推荐使用
使用DefaultHttpClient类实例化httpClient对象:
public static String dooPost_deprecated(String url, Map<String, String> map, String charset)
DefaultHttpClient httpClient = null;
HttpPost httpPost = null;
String result = null;
try
httpClient = new DefaultHttpClient();
httpPost = new HttpPost(url);
// 设置参数
List<NameValuePair> list = new ArrayList<NameValuePair>();
Iterator<Entry<String, String>> iterator = map.entrySet().iterator();
while (iterator.hasNext())
Entry<String, String> elem = (Entry<String, String>) iterator.next();
list.add(new BasicNameValuePair(elem.getKey(), elem.getValue()));
if (list.size() > 0)
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(list, charset);
httpPost.setEntity(entity);
HttpResponse response = httpClient.execute(httpPost);
System.out.println(response.getStatusLine().getStatusCode());
String JSESSIONID = null;
String cookie_user = null;
//获得Cookies
CookieStore cookieStore = httpClient.getCookieStore();
List<Cookie> cookies = cookieStore.getCookies();
for (int i = 0; i < cookies.size(); i++)
//遍历Cookies
System.out.println(cookies.get(i));
System.out.println("cookiename=="+cookies.get(i).getName());
System.out.println("cookieValue=="+cookies.get(i).getValue());
System.out.println("Domain=="+cookies.get(i).getDomain());
System.out.println("Path=="+cookies.get(i).getPath());
System.out.println("Version=="+cookies.get(i).getVersion());
if (cookies.get(i).getName().equals("JSESSIONID"))
JSESSIONID = cookies.get(i).getValue();
if (cookies.get(i).getName().equals("cookie_user"))
cookie_user = cookies.get(i).getValue();
if (cookie_user != null)
result = JSESSIONID;
catch (Exception ex)
ex.printStackTrace();
return result;
二、新版本的HttpClient获取Cookies
使用CloseableHttpClient类实例化httpClient对象:
public static String doPost(Map<String, String> map, String charset)
CloseableHttpClient httpClient = null;
HttpPost httpPost = null;
String result = null;
try
CookieStore cookieStore = new BasicCookieStore();
httpClient = HttpClients.custom().setDefaultCookieStore(cookieStore).build();
httpPost = new HttpPost("http://localhost:8080/testtoolmanagement/LoginServlet");
List<NameValuePair> list = new ArrayList<NameValuePair>();
Iterator<Map.Entry<String, String>> iterator = map.entrySet().iterator();
while (iterator.hasNext())
Entry<String, String> elem = (Entry<String, String>) iterator.next();
list.add(new BasicNameValuePair(elem.getKey(), elem.getValue()));
if (list.size() > 0)
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(list, charset);
httpPost.setEntity(entity);
httpClient.execute(httpPost);
String JSESSIONID = null;
String cookie_user = null;
List<Cookie> cookies = cookieStore.getCookies();
for (int i = 0; i < cookies.size(); i++)
if (cookies.get(i).getName().equals("JSESSIONID"))
JSESSIONID = cookies.get(i).getValue();
if (cookies.get(i).getName().equals("cookie_user"))
cookie_user = cookies.get(i).getValue();
if (cookie_user != null)
result = JSESSIONID;
catch (Exception ex)
ex.printStackTrace();
return result;
以上是关于java爬虫知识盲区整理的主要内容,如果未能解决你的问题,请参考以下文章