java下载html页面---把网页内容保存成本地html

Posted 2022-09-08 张小凡vip

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了java下载html页面---把网页内容保存成本地html相关的知识，希望对你有一定的参考价值。

我们在前面讲到httpclient抓取网页内容的时候通常都是获取到页面的源代码content存入数据库。

详见下文:

HTTPClient模块的HttpGet和HttpPost

httpclient常用基本抓取类

那么如果我们除了获得页面源代码之外还想把页面保存到本地存成html应该怎么做呢？

其实很简单我们先来看访问页面获取content的代码

	private static String getUrlContent(DefaultHttpClient httpPostClient,
			String urlString) throws IOException, ClientProtocolException 
		HttpGet httpGet = new HttpGet(urlString);
		HttpResponse httpGetResponse = httpPostClient.execute(httpGet);// 其中HttpGet是HttpUriRequst的子类
		httpPostClient.getParams().setParameter(
				CoreConnectionPNames.CONNECTION_TIMEOUT, 10000);// 连接时间20s
		httpPostClient.getParams().setParameter(
				CoreConnectionPNames.SO_TIMEOUT, 8000);// 数据传输时间60s
		if (httpGetResponse.getStatusLine().getStatusCode() == 200) 
			HttpEntity httpEntity = httpGetResponse.getEntity();
			if (httpEntity.getContentEncoding() != null) 
				if ("gzip".equalsIgnoreCase(httpEntity.getContentEncoding()
						.getValue())) 
					httpEntity = new GzipDecompressingEntity(httpEntity);
				 else if ("deflate".equalsIgnoreCase(httpEntity
						.getContentEncoding().getValue())) 
					httpEntity = new DeflateDecompressingEntity(httpEntity);
				
			
			String result = enCodetoString(httpEntity, encode);// 取出应答字符串
			// System.out.println(result);
			return result;
		
		return "";

	public static String enCodetoStringDo(final HttpEntity entity,
			Charset defaultCharset) throws IOException, ParseException 
		if (entity == null) 
			throw new IllegalArgumentException("HTTP entity may not be null");
		
		InputStream instream = entity.getContent();
		if (instream == null) 
			return null;
		
		try 
			if (entity.getContentLength() > Integer.MAX_VALUE) 
				throw new IllegalArgumentException(
						"HTTP entity too large to be buffered in memory");
			
			int i = (int) entity.getContentLength();
			if (i < 0) 
				i = 4096;
			
			Charset charset = null;
			try 
				// ContentType contentType = ContentType.get(entity);
				// if (contentType != null) 
				// charset = contentType.getCharset();
				// 
			 catch (final UnsupportedCharsetException ex) 
				throw new UnsupportedEncodingException(ex.getMessage());
			
			if (charset == null) 
				charset = defaultCharset;
			
			if (charset == null) 
				charset = HTTP.DEF_CONTENT_CHARSET;
			
			Reader reader = new InputStreamReader(instream, charset);
			CharArrayBuffer buffer = new CharArrayBuffer(i);
			char[] tmp = new char[1024];
			int l;
			while ((l = reader.read(tmp)) != -1) 
				buffer.append(tmp, 0, l);
			
			return buffer.toString();
		 finally 
			instream.close();

我们得到content之后就可以直接把它存成本地文件就可以了。

我们可以参考

java读写txt

把txt后缀改成html即可

public static void writeToFile(String fileName, String content)
String time = DATE_FORMAT.format(Calendar.getInstance().getTime());
File dirFile = null;
try
dirFile = new File("e:\\\\" + time);
if (!(dirFile.exists()) && !(dirFile.isDirectory()))
boolean creadok = dirFile.mkdirs();
if (creadok)
System.out.println(" ok:创建文件夹成功！ ");
else
System.out.println(" err:创建文件夹失败！ ");
catch (Exception e)
e.printStackTrace();
String fullPath = dirFile + "/" + fileName + ".txt";
write(fullPath, content);
/**
* 写文件
*
* @param path
* @param content
*/
public static boolean write(String path, String content)
String s = new String();
String s1 = new String();
BufferedWriter output = null;
try
File f = new File(path);
if (f.exists())
else
System.out.println("文件不存在，正在创建...");
if (f.createNewFile())
System.out.println("文件创建成功！");
else
System.out.println("文件创建失败！");
BufferedReader input = new BufferedReader(new FileReader(f));
while ((s = input.readLine()) != null)
s1 += s + "\\n";
System.out.println("原文件内容：" + s1);
input.close();
s1 += content;
output = new BufferedWriter(new FileWriter(f));
output.write(s1);
output.flush();
return true;
catch (Exception e)
e.printStackTrace();
return false;
finally
if (output != null)
try
output.close();
catch (IOException e)
e.printStackTrace();

以上是关于java下载html页面---把网页内容保存成本地html的主要内容，如果未能解决你的问题，请参考以下文章