使用 Java 从 Github 下载二进制文件
Posted
技术标签:
【中文标题】使用 Java 从 Github 下载二进制文件【英文标题】:Download binary file from Github using Java 【发布时间】:2012-11-06 15:53:00 【问题描述】:我正在尝试使用以下方法下载此文件 (http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar),但它似乎不起作用。我得到一个空/损坏的文件。
String link = "http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar";
String fileName = "ChampionHelper-4.jar";
URL url = new URL(link);
URLConnection c = url.openConnection();
c.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 1.2.30703)");
InputStream input;
input = c.getInputStream();
byte[] buffer = new byte[4096];
int n = -1;
OutputStream output = new FileOutputStream(new File(fileName));
while ((n = input.read(buffer)) != -1)
if (n > 0)
output.write(buffer, 0, n);
output.close();
但我可以使用相同的方法从我的 Dropbox (http://dl.dropbox.com/u/13226123/ChampionHelper-4.jar) 成功下载以下文件。
所以 Github 不知何故知道我不是试图下载文件的普通用户。我已经尝试过更改用户代理,但也没有用。
那么我应该如何使用 Java 下载托管在我的 Github 帐户上的文件?
编辑:我尝试为此使用 apache commons-io,但我得到了相同的效果,一个空/损坏的文件。
【问题讨论】:
我可以毫无问题地从 github 下载文件。我的浏览器是 Windows 7 上的 Chrome v23。 @Chris 问题不在于能够通过浏览器下载文件。重新阅读问题 【参考方案1】:当您请求此文件时,GitHub 似乎为您提供了多个级别的重定向,this *** article 声明 URLConnection 不会自动遵循更改协议的重定向。这是我在 curl 中看到的内容:
第一个请求:
curl -v http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
* About to connect() to github.com port 80 (#0)
* Trying 207.97.227.239... connected
* Connected to github.com (207.97.227.239) port 80 (#0)
> GET /downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: github.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx < Date: Sun, 18 Nov 2012 15:56:36 GMT
< Content-Type: text/html < Content-Length: 178
< Connection: close
< Location: https://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
< <html> <head><title>301 Moved Permanently</title></head> <body bgcolor="white"> <center><h1>301 Moved Permanently</h1></center> <hr><center>nginx</center> </body> </html>
* Closing connection #0
此位置标头的卷曲:
curl -v https://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
* About to connect() to github.com port 443 (#0)
* Trying 207.97.227.239... connected
* Connected to github.com (207.97.227.239) port 443 (#0)
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using RC4-SHA
* Server certificate:
* subject: businessCategory=Private Organization; 1.3.6.1.4.1.311.60.2.1.3=US; 1.3.6.1.4.1.311.60.2.1.2=California; serialNumber=C3268102; C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=github.com
* start date: 2011-05-27 00:00:00 GMT
* expire date: 2013-07-29 12:00:00 GMT
* subjectAltName: github.com matched
* issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert High Assurance EV CA-1
* SSL certificate verify ok.
> GET /downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: github.com
> Accept: */*
>
< HTTP/1.1 302 Found
< Server: nginx
< Date: Sun, 18 Nov 2012 15:58:56 GMT
< Content-Type: text/html; charset=utf-8
< Connection: keep-alive
< Status: 302 Found
< Strict-Transport-Security: max-age=2592000
< Cache-Control: no-cache
< X-Runtime: 48
< Location: http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
< X-Frame-Options: deny
< Content-Length: 149
<
* Connection #0 to host github.com left intact
* Closing connection #0
* SSLv3, TLS alert, Client hello (1):
<html><body>You are being <a href="http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar">redirected</a>.</body></html>
此响应中的位置标头正在返回实际文件。您可能想使用 Apache HTTP 客户端来下载它。您可以将其设置为在 GET 期间遵循这些 301 和 302 重定向。
【讨论】:
您可以在您的 HTTPURLConnection 实例上调用setInstanceFollowRedirects
以自动跟踪这些重定向
我链接到的 *** 文章指出 URLConnection 不会遵循更改协议的重定向。您是否编写了一些代码来测试 setInstanceFollowRedirects 是否有效?
你是对的......有必要添加这个:while (c.getResponseCode() > 300 && c.getResponseCode() < 400) c = (HttpURLConnection) (new URL(c.getHeaderField("Location"))).openConnection();
【参考方案2】:
这个做的工作:
public class Download
private static boolean isRedirected( Map<String, List<String>> header )
for( String hv : header.get( null ))
if( hv.contains( " 301 " )
|| hv.contains( " 302 " )) return true;
return false;
public static void main( String[] args ) throws Throwable
String link =
"http://github.com/downloads/TheHolyWaffle/ChampionHelper/" +
"ChampionHelper-4.jar";
String fileName = "ChampionHelper-4.jar";
URL url = new URL( link );
HttpURLConnection http = (HttpURLConnection)url.openConnection();
Map< String, List< String >> header = http.getHeaderFields();
while( isRedirected( header ))
link = header.get( "Location" ).get( 0 );
url = new URL( link );
http = (HttpURLConnection)url.openConnection();
header = http.getHeaderFields();
InputStream input = http.getInputStream();
byte[] buffer = new byte[4096];
int n = -1;
OutputStream output = new FileOutputStream( new File( fileName ));
while ((n = input.read(buffer)) != -1)
output.write( buffer, 0, n );
output.close();
【讨论】:
【参考方案3】:获取原始二进制文件的直接下载链接,例如
https://github.com/xerial/sqlite-jdbc/blob/master/src/main/resources/org/sqlite/native/Windows/x86_64/sqlitejdbc.dll?raw=true
通过复制View Raw
链接:
最后使用下面这段代码下载文件:
public static void download(String downloadURL) throws IOException
URL website = new URL(downloadURL);
String fileName = getFileName(downloadURL);
try (InputStream inputStream = website.openStream())
Files.copy(inputStream, Paths.get(fileName), StandardCopyOption.REPLACE_EXISTING);
public static String getFileName(String downloadURL)
String baseName = FilenameUtils.getBaseName(downloadURL);
String extension = FilenameUtils.getExtension(downloadURL);
String fileName = baseName + "." + extension;
int questionMarkIndex = fileName.indexOf("?");
if (questionMarkIndex != -1)
fileName = fileName.substring(0, questionMarkIndex);
fileName = fileName.replaceAll("-", "");
return URLDecoder.decode(fileName, "UTF-8");
对于FilenameUtils
类,您还需要Apache Commons IO
maven 依赖项:
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>LATEST</version>
</dependency>
【讨论】:
【参考方案4】:我找到了解决方案。
显然http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
没有直接链接到我的文件。
使用文本编辑器查看生成的 jar 时,我发现:
<html><body>You are being <a href="http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar">redirected</a>.</body></html>
所以这意味着直接链接如下:http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
通过这个链接,我可以毫无问题地用我的方法下载文件。
【讨论】:
查看我的帖子,它在没有 Apache 或任何第三方库的情况下处理重定向【参考方案5】:我可以让它适用于有问题的链接模板
http://github.com/downloads/Nodeclipse/eclipse-node-ide/CoffeeScriptSet.p2f
不是这个
http://cloud.github.com/downloads/Nodeclipse/eclipse-node-ide/CoffeeScriptSet.p2f
以下是对我有用的方法
https://raw.github.com/Nodeclipse/eclipse-node-ide/master/EclipseNodeIDE-0.2.p2f
【讨论】:
以上是关于使用 Java 从 Github 下载二进制文件的主要内容,如果未能解决你的问题,请参考以下文章
JAVA - 从网络服务器下载二进制文件(例如 PDF)文件
使用IDEA从github中下载fastdfs-client-java