python wkhtmltopdf模块的外壳包装。提供包含行分隔URL的文本文件作为命令行参数。
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python wkhtmltopdf模块的外壳包装。提供包含行分隔URL的文本文件作为命令行参数。相关的知识,希望对你有一定的参考价值。
#!/usr/bin/env python
import time
import random
import subprocess
import sys
import string
# changing shell title
sys.stdout.write("\x1b]2;" + "web2pdf Downloader" + "\x07")
folder_path = '~/Desktop/pulled_pdfs/'
if len(sys.argv) < 2:
print('Please provide input file.')
sys.exit(1)
else:
url_file = sys.argv[1]
def pdf_print(link):
# create folder if does not exist:
subprocess.Popen(['mkdir -p ' + folder_path], shell=True)
# create file path + name:
chars = string.digits + string.uppercase
prefix = ''
for p in range(9):
prefix += random.choice(chars)
filename = folder_path + prefix + '.pdf'
# number of trials:
trials = 7
for trial in range(1, trials):
try:
cmd = 'wkhtmltopdf --page-size A4 '
subprocess.call([cmd + link + ' ' + filename], shell=True) # executing wkhtmltopdf command
except IOError:
print 'Problem with wkhtmltopdf. Trying again'
with open(url_file, 'r') as f:
url_list = f.readlines()
# removing duplicates:
url_list = set(url_list)
# strip operation:
url_list = [u.strip() for u in url_list]
total_pdfs = len(url_list)
print('\n\nDownloading {} web pages to pdf.\n'.format(total_pdfs))
start = time.time()
# running retrieval process:
for item in url_list:
pdf_print(item)
print('\n\nJob took {0:.1f} minutes to complete'.format((time.time() - start)/60))
print('\nSuccessfully downloaded {} files.').format( total_pdfs)
以上是关于python wkhtmltopdf模块的外壳包装。提供包含行分隔URL的文本文件作为命令行参数。的主要内容,如果未能解决你的问题,请参考以下文章
使用 wkhtmltopdf 将 Twitter 引导页面转换为 PDF:跨度问题
如何创建线程安全的 c# exe 包装器
嵌入(创建)一个Python程序中的交互式Python外壳
python 爬虫,网页转PDF:OSError: No wkhtmltopdf executable found
《html转pdf-----wkhtmltopdf踩坑总结》
无法使用 python PDFKIT 创建 pdf 错误:“找不到 wkhtmltopdf 可执行文件:”