python wkhtmltopdf模块的外壳包装。提供包含行分隔URL的文本文件作为命令行参数。

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python wkhtmltopdf模块的外壳包装。提供包含行分隔URL的文本文件作为命令行参数。相关的知识,希望对你有一定的参考价值。

#!/usr/bin/env python

import time
import random
import subprocess
import sys
import string

# changing shell title
sys.stdout.write("\x1b]2;" + "web2pdf Downloader" + "\x07")

folder_path = '~/Desktop/pulled_pdfs/'


if len(sys.argv) < 2:
    print('Please provide input file.')
    sys.exit(1)
else:
    url_file = sys.argv[1]


def pdf_print(link):    
    # create folder if does not exist:
    subprocess.Popen(['mkdir -p ' + folder_path], shell=True)

    # create file path + name:
    chars = string.digits + string.uppercase
    prefix = ''
    for p in range(9):
        prefix += random.choice(chars)
    filename = folder_path + prefix + '.pdf'

    # number of trials:
    trials = 7
    for trial in range(1, trials):
        try:
            cmd = 'wkhtmltopdf --page-size A4 '           

            subprocess.call([cmd + link + ' ' + filename], shell=True)	# executing wkhtmltopdf command   
        except IOError:
            print 'Problem with wkhtmltopdf. Trying again'
            



with open(url_file, 'r') as f:
    url_list = f.readlines()

# removing duplicates:
url_list = set(url_list)

# strip operation:
url_list = [u.strip() for u in url_list]

total_pdfs = len(url_list)

print('\n\nDownloading {} web pages to pdf.\n'.format(total_pdfs))
start = time.time()

# running retrieval process:
for item in url_list:
	pdf_print(item)


print('\n\nJob took {0:.1f} minutes to complete'.format((time.time() - start)/60))
print('\nSuccessfully downloaded {} files.').format( total_pdfs)


以上是关于python wkhtmltopdf模块的外壳包装。提供包含行分隔URL的文本文件作为命令行参数。的主要内容,如果未能解决你的问题,请参考以下文章

使用 wkhtmltopdf 将 Twitter 引导页面转换为 PDF:跨度问题

如何创建线程安全的 c# exe 包装器

嵌入(创建)一个Python程序中的交互式Python外壳

python 爬虫,网页转PDF:OSError: No wkhtmltopdf executable found

《html转pdf-----wkhtmltopdf踩坑总结》

无法使用 python PDFKIT 创建 pdf 错误:“找不到 wkhtmltopdf 可执行文件:”