使用 Java 从 Github 下载二进制文件

Posted

技术标签:

【中文标题】使用 Java 从 Github 下载二进制文件【英文标题】:Download binary file from Github using Java 【发布时间】:2012-11-06 15:53:00 【问题描述】:

我正在尝试使用以下方法下载此文件 (http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar),但它似乎不起作用。我得到一个空/损坏的文件。

String link = "http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar";
String fileName = "ChampionHelper-4.jar";

URL url = new URL(link);
URLConnection c = url.openConnection();
c.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 1.2.30703)");

InputStream input;
input = c.getInputStream();
byte[] buffer = new byte[4096];
int n = -1;

OutputStream output = new FileOutputStream(new File(fileName));
while ((n = input.read(buffer)) != -1) 
    if (n > 0) 
        output.write(buffer, 0, n);
    

output.close();

但我可以使用相同的方法从我的 Dropbox (http://dl.dropbox.com/u/13226123/ChampionHelper-4.jar) 成功下载以下文件。

所以 Github 不知何故知道我不是试图下载文件的普通用户。我已经尝试过更改用户代理,但也没有用。

那么我应该如何使用 Java 下载托管在我的 Github 帐户上的文件?

编辑:我尝试为此使用 apache commons-io,但我得到了相同的效果,一个空/损坏的文件。

【问题讨论】:

我可以毫无问题地从 github 下载文件。我的浏览器是 Windows 7 上的 Chrome v23。 @Chris 问题不在于能够通过浏览器下载文件。重新阅读问题 【参考方案1】:

当您请求此文件时,GitHub 似乎为您提供了多个级别的重定向,this *** article 声明 URLConnection 不会自动遵循更改协议的重定向。这是我在 curl 中看到的内容:

第一个请求:

curl -v http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
* About to connect() to github.com port 80 (#0)
*   Trying 207.97.227.239... connected
* Connected to github.com (207.97.227.239) port 80 (#0)
> GET /downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: github.com
> Accept: */*
>  
< HTTP/1.1 301 Moved Permanently 
< Server: nginx < Date: Sun, 18 Nov 2012 15:56:36 GMT 
< Content-Type: text/html < Content-Length: 178 
< Connection: close 
< Location: https://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar 
<  <html> <head><title>301 Moved Permanently</title></head> <body bgcolor="white"> <center><h1>301 Moved Permanently</h1></center> <hr><center>nginx</center> </body> </html>
* Closing connection #0

此位置标头的卷曲:

curl -v https://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
* About to connect() to github.com port 443 (#0)
*   Trying 207.97.227.239... connected
* Connected to github.com (207.97.227.239) port 443 (#0)
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using RC4-SHA
* Server certificate:
*    subject: businessCategory=Private Organization; 1.3.6.1.4.1.311.60.2.1.3=US; 1.3.6.1.4.1.311.60.2.1.2=California; serialNumber=C3268102; C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=github.com
*    start date: 2011-05-27 00:00:00 GMT
*    expire date: 2013-07-29 12:00:00 GMT
*    subjectAltName: github.com matched
*    issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert High Assurance EV CA-1
*    SSL certificate verify ok.
> GET /downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: github.com
> Accept: */*
> 
< HTTP/1.1 302 Found
< Server: nginx
< Date: Sun, 18 Nov 2012 15:58:56 GMT
< Content-Type: text/html; charset=utf-8
< Connection: keep-alive
< Status: 302 Found
< Strict-Transport-Security: max-age=2592000
< Cache-Control: no-cache
< X-Runtime: 48
< Location: http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
< X-Frame-Options: deny
< Content-Length: 149
< 
* Connection #0 to host github.com left intact
* Closing connection #0
* SSLv3, TLS alert, Client hello (1):
<html><body>You are being <a href="http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar">redirected</a>.</body></html>

此响应中的位置标头正在返回实际文件。您可能想使用 Apache HTTP 客户端来下载它。您可以将其设置为在 GET 期间遵循这些 301 和 302 重定向。

【讨论】:

您可以在您的 HTTPURLConnection 实例上调用 setInstanceFollowRedirects 以自动跟踪这些重定向 我链接到的 *** 文章指出 URLConnection 不会遵循更改协议的重定向。您是否编写了一些代码来测试 setInstanceFollowRedirects 是否有效? 你是对的......有必要添加这个:while (c.getResponseCode() &gt; 300 &amp;&amp; c.getResponseCode() &lt; 400) c = (HttpURLConnection) (new URL(c.getHeaderField("Location"))).openConnection();【参考方案2】:

这个做的工作:

public class Download 
   private static boolean isRedirected( Map<String, List<String>> header ) 
      for( String hv : header.get( null )) 
         if(   hv.contains( " 301 " )
            || hv.contains( " 302 " )) return true;
      
      return false;
   
   public static void main( String[] args ) throws Throwable
   
      String link =
         "http://github.com/downloads/TheHolyWaffle/ChampionHelper/" +
         "ChampionHelper-4.jar";
      String            fileName = "ChampionHelper-4.jar";
      URL               url  = new URL( link );
      HttpURLConnection http = (HttpURLConnection)url.openConnection();
      Map< String, List< String >> header = http.getHeaderFields();
      while( isRedirected( header )) 
         link = header.get( "Location" ).get( 0 );
         url    = new URL( link );
         http   = (HttpURLConnection)url.openConnection();
         header = http.getHeaderFields();
      
      InputStream  input  = http.getInputStream();
      byte[]       buffer = new byte[4096];
      int          n      = -1;
      OutputStream output = new FileOutputStream( new File( fileName ));
      while ((n = input.read(buffer)) != -1) 
         output.write( buffer, 0, n );
      
      output.close();
   

【讨论】:

【参考方案3】:

获取原始二进制文件的直接下载链接,例如 https://github.com/xerial/sqlite-jdbc/blob/master/src/main/resources/org/sqlite/native/Windows/x86_64/sqlitejdbc.dll?raw=true 通过复制View Raw 链接:

最后使用下面这段代码下载文件:

public static void download(String downloadURL) throws IOException

    URL website = new URL(downloadURL);
    String fileName = getFileName(downloadURL);

    try (InputStream inputStream = website.openStream())
    
        Files.copy(inputStream, Paths.get(fileName), StandardCopyOption.REPLACE_EXISTING);
    


public static String getFileName(String downloadURL)

    String baseName = FilenameUtils.getBaseName(downloadURL);
    String extension = FilenameUtils.getExtension(downloadURL);
    String fileName = baseName + "." + extension;

    int questionMarkIndex = fileName.indexOf("?");
    if (questionMarkIndex != -1)
    
        fileName = fileName.substring(0, questionMarkIndex);
    

    fileName = fileName.replaceAll("-", "");
    return URLDecoder.decode(fileName, "UTF-8");

对于FilenameUtils 类,您还需要Apache Commons IO maven 依赖项:

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>LATEST</version>
</dependency>

【讨论】:

【参考方案4】:

我找到了解决方案。

显然http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar 没有直接链接到我的文件。

使用文本编辑器查看生成的 jar 时,我发现:

<html><body>You are being <a href="http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar">redirected</a>.</body></html>

所以这意味着直接链接如下:http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar

通过这个链接,我可以毫无问题地用我的方法下载文件。

【讨论】:

查看我的帖子,它在没有 Apache 或任何第三方库的情况下处理重定向【参考方案5】:

我可以让它适用于有问题的链接模板

http://github.com/downloads/Nodeclipse/eclipse-node-ide/CoffeeScriptSet.p2f

不是这个

http://cloud.github.com/downloads/Nodeclipse/eclipse-node-ide/CoffeeScriptSet.p2f

以下是对我有用的方法

https://raw.github.com/Nodeclipse/eclipse-node-ide/master/EclipseNodeIDE-0.2.p2f

【讨论】:

以上是关于使用 Java 从 Github 下载二进制文件的主要内容,如果未能解决你的问题,请参考以下文章

JAVA - 从网络服务器下载二进制文件(例如 PDF)文件

使用IDEA从github中下载fastdfs-client-java

在Ubuntu系统中使用HECO二进制文件部署HECO主网同步节点

二进制方式快速部署BSC主网v1.1.2

Github文件高速下载方法

如何使用 Java 代码从 Git 存储库下载文件