从 archive.org 下载文件

Posted 2023-03-06

技术标签:

【中文标题】从 archive.org 下载文件【英文标题】：Download file from archive.org 【发布时间】：2015-10-14 23:04:01 【问题描述】：

我想从archive.org下载一个文件，目标是正确的，但是它给出了一个0KB的文件，使用相同的脚本，并从我自己的服务器下载相同的文件，它变成了TRUE，并且文件已下载。

这是脚本，提示链接：

$saveit = '<a href="Files/direct_download.php?path='.$directLink.'/&file='.$fileName.'" id="'.$id.'" style="cursor: pointer;" target="_BLANK">';
$saveit .='<img src="'.$path2icons.'Download32_32.png" class="embedDownload masterTooltip"    title="حفظ الملف" align="absmiddle" />';
$saveit .='</a>';
echo $saveit;

direct_download.php：

$url = $_GET['path'];

//echo $url.'<br>';

$fileName = $_GET['file'];

//echo $fileName;

set_time_limit(0);

header("Pragma: public");
header("Expires: 0"); 
header("Cache-Control: must-revalidate, post-check=0, pre-check=0"); 
header("Cache-Control: private",false);
header("Content-Type: application/download"); 
header("Content-Disposition: filename=$fileName");

$ch = curl_init($url.$fileName);
curl_exec($ch);
curl_close($ch);        
exit();

提前致谢

【问题讨论】：

【参考方案1】：

尝试添加用户代理：

curl_setopt( $ch, CURLOPT_USERAGENT, "My User Agent" );

尝试通过 wget 下载文件。

system("wget ....")

【讨论】：

如何知道用户代理，功能系统从服务器被禁用例如，您可以使用来自 $_SERVER['HTTP_USER_AGENT'] 的用户用户代理

以上是关于从 archive.org 下载文件的主要内容，如果未能解决你的问题，请参考以下文章

sh 从web.archive.org下载存档的Web

python 快速脚本从archive.org获取所有连接的页面，并在uploads文件夹中下载文件