java下载html页面---把网页内容保存成本地html
Posted 张小凡vip
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了java下载html页面---把网页内容保存成本地html相关的知识,希望对你有一定的参考价值。
我们在前面讲到httpclient抓取网页内容的时候 通常都是获取到页面的源代码content存入数据库。
详见下文:
那么如果我们除了获得页面源代码之外 还想把页面保存到本地存成html应该怎么做呢?
其实很简单 我们先来看访问页面获取content的代码
private static String getUrlContent(DefaultHttpClient httpPostClient,
String urlString) throws IOException, ClientProtocolException
HttpGet httpGet = new HttpGet(urlString);
HttpResponse httpGetResponse = httpPostClient.execute(httpGet);// 其中HttpGet是HttpUriRequst的子类
httpPostClient.getParams().setParameter(
CoreConnectionPNames.CONNECTION_TIMEOUT, 10000);// 连接时间20s
httpPostClient.getParams().setParameter(
CoreConnectionPNames.SO_TIMEOUT, 8000);// 数据传输时间60s
if (httpGetResponse.getStatusLine().getStatusCode() == 200)
HttpEntity httpEntity = httpGetResponse.getEntity();
if (httpEntity.getContentEncoding() != null)
if ("gzip".equalsIgnoreCase(httpEntity.getContentEncoding()
.getValue()))
httpEntity = new GzipDecompressingEntity(httpEntity);
else if ("deflate".equalsIgnoreCase(httpEntity
.getContentEncoding().getValue()))
httpEntity = new DeflateDecompressingEntity(httpEntity);
String result = enCodetoString(httpEntity, encode);// 取出应答字符串
// System.out.println(result);
return result;
return "";
public static String enCodetoStringDo(final HttpEntity entity,
Charset defaultCharset) throws IOException, ParseException
if (entity == null)
throw new IllegalArgumentException("HTTP entity may not be null");
InputStream instream = entity.getContent();
if (instream == null)
return null;
try
if (entity.getContentLength() > Integer.MAX_VALUE)
throw new IllegalArgumentException(
"HTTP entity too large to be buffered in memory");
int i = (int) entity.getContentLength();
if (i < 0)
i = 4096;
Charset charset = null;
try
// ContentType contentType = ContentType.get(entity);
// if (contentType != null)
// charset = contentType.getCharset();
//
catch (final UnsupportedCharsetException ex)
throw new UnsupportedEncodingException(ex.getMessage());
if (charset == null)
charset = defaultCharset;
if (charset == null)
charset = HTTP.DEF_CONTENT_CHARSET;
Reader reader = new InputStreamReader(instream, charset);
CharArrayBuffer buffer = new CharArrayBuffer(i);
char[] tmp = new char[1024];
int l;
while ((l = reader.read(tmp)) != -1)
buffer.append(tmp, 0, l);
return buffer.toString();
finally
instream.close();
我们得到content之后就可以直接 把它存成本地文件 就 可以了。
我们可以参考
java读写txt
把txt后缀改成html即可
- public static void writeToFile(String fileName, String content)
- String time = DATE_FORMAT.format(Calendar.getInstance().getTime());
- File dirFile = null;
- try
- dirFile = new File("e:\\\\" + time);
- if (!(dirFile.exists()) && !(dirFile.isDirectory()))
- boolean creadok = dirFile.mkdirs();
- if (creadok)
- System.out.println(" ok:创建文件夹成功! ");
- else
- System.out.println(" err:创建文件夹失败! ");
- catch (Exception e)
- e.printStackTrace();
- String fullPath = dirFile + "/" + fileName + ".txt";
- write(fullPath, content);
- /**
- * 写文件
- *
- * @param path
- * @param content
- */
- public static boolean write(String path, String content)
- String s = new String();
- String s1 = new String();
- BufferedWriter output = null;
- try
- File f = new File(path);
- if (f.exists())
- else
- System.out.println("文件不存在,正在创建...");
- if (f.createNewFile())
- System.out.println("文件创建成功!");
- else
- System.out.println("文件创建失败!");
- BufferedReader input = new BufferedReader(new FileReader(f));
- while ((s = input.readLine()) != null)
- s1 += s + "\\n";
- System.out.println("原文件内容:" + s1);
- input.close();
- s1 += content;
- output = new BufferedWriter(new FileWriter(f));
- output.write(s1);
- output.flush();
- return true;
- catch (Exception e)
- e.printStackTrace();
- return false;
- finally
- if (output != null)
- try
- output.close();
- catch (IOException e)
- e.printStackTrace();
以上是关于java下载html页面---把网页内容保存成本地html的主要内容,如果未能解决你的问题,请参考以下文章