在python中生成多个PDF报告

Posted 2021-04-02

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了在python中生成多个PDF报告相关的知识，希望对你有一定的参考价值。

我似乎无法找到任何以前的问题，教程或YouTube视频来帮助解决我的问题。该项目正在创建500个随机角色，将该信息导出到csv，然后填写可填写的PDF表单。一旦我启动并运行，我就可以将其交给HR来帮助他们填写表格。我能够创建一个报告，但对于我的生活无法弄清楚如何制作其他499.每次我尝试它都会覆盖以前的结果。

我的随机角色生成器：

import random
import sys

sys.stdout = open('roles.csv', 'a')

def role_generator():

    firstnames = open ('first_names.txt').read().splitlines()

    lastnames = open ('last_names.txt').read().splitlines()

    months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

    for num in range(500):
        first = random.choice(firstnames)
        last = random.choice(lastnames)
        day = random.randint(1, 29)
        month = random.choice(months)
        year = random.randint(1960, 2001)
        idnumber = random.randint(1234567, 9999999)


        print(f'1, last name, {last}\n2, first name, {first}\n3, id number, {idnumber}\n4, date of birth, {day}-{month}-{year}\n')


role_generator()

我的PDF填充：

import os

os.system('pdfforms inspect screening*.pdf')
os.system('pdfforms fill roles.csv '
          'screening.pdf '
          'screening_1.pdf')

我对编程非常陌生，所以请打破Barney风格的任何答案，这样我才能理解。我在Ubuntu OS上使用Python 3.6。你看到的所有编码都是我从研究到目前为止拼凑而成的。

谢谢！

更新：

应Vitor Baptista的请求，这是程序保存csv文件的方式：

screening.pdf
1, last name, Hendrickson
2, first name, Jane
3, id number, 8190287
4, date of birth, 6-Feb-1991

从我收集的内容中，您需要将pdf文件放在csv文件的第一列和第一行中。然后，您需要在pdf表单中标记每个条目的位置。我是通过上面的inspect命令完成的，它创建了一个JSON文件。然后我查看了JSON以查看每个字段的数值，以便我可以在csv中正确标记它们。

答案

考虑到路径，这可能会有一些错误 - 但应该以某种方式工作：

import random 

def make_filename(first,last,year,month,day):
    # make sure your names do not contain any character thats 
    # impossible in a filename - if so, clean them first or
    # clean the file name after constructing it

    # Potter_Harry_1970_Jan_01.pdf
    return f'{last}_{first}_{year}_{month}_{day:02}.pdf'

def role_generator(): 
    # fixed the file reading to use with open
    # changed the id-generation to not have dupes
    # changed it to yield each single result as tuple (filename, text)
    with open ('first_names.txt') as f :
        firstnames = [x.strip() for x in f.read().splitlines() if x.strip()] 
    with open ('last_names.txt') as f:
        lastnames = [x.strip() for x in f.read().splitlines() if x.strip()]

    months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
              'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    how_many = 500

    # changed because "idnumber = random.randint(1234567, 9999999)" may produce dupes 
    ids = random.sample(range(1234567, 10000000),k=how_many) # no dupes this way  

    for num in range(how_many):
        first = random.choice(firstnames)
        last = random.choice(lastnames)
        day = random.randint(1, 29)
        month = random.choice(months)
        year = random.randint(1960, 2001)
        idnumber = id.pop() 

        text = f'''screening.pdf
1, last name, {last}
2, first name, {first}
3, id number, {idnumber}
4, date of birth, {day}-{month}-{year}
'''

        yield (make_filename(first,last,year,month,day),text) 

# for each single result do:
for new_name, text in role_generator():
    # write one person as roles.csv
    with open("./roles.csv","w") as f:
        f.write(text)
    # fill one pdf - might need absolute path to template-pdf
    os.system('pdfforms inspect ./screening.pdf')
    # this also might need the absolute path
    os.system('pdfforms fill ./roles.csv')
    # this will rename the one pdf to the new_name also provided - you might
    # need to fix this to fit the paths
    os.rename('/home/PycharmProjects/untitled/filled/screening.pdf', 
              '/home/PycharmProjects/untitled/screening/' + new_name)

这样每个单独的pdf表单由一个roles.csv生成，然后移动/重命名为类似于您的人名....

另一答案

好吧，我想出来然后一些。它很可能不是最好的方法，但现在它的工作原理：

我没有改变任何东西到随机角色生成器所以它仍然看起来像这样：

import random
import sys

sys.stdout = open('roles.csv', 'a')

def role_generator():

    firstnames = open ('first_names.txt').read().splitlines()

    lastnames = open ('last_names.txt').read().splitlines()

    months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

    for num in range(500):
        first = random.choice(firstnames)
        last = random.choice(lastnames)
        day = random.randint(1, 29)
        month = random.choice(months)
        year = random.randint(1960, 2001)
        idnumber = random.randint(1234567, 9999999)


        print(f'1, last name, {last}\n2, first name, {first}\n3, id number, {idnumber}\n4, date of birth, {day}-{month}-{year}\n')


role_generator()

我在PDF Filler中添加了一个终端命令：

def rolegenerator():

    os.system('pdfforms inspect screening.pdf')
    os.system('pdfforms fill roles.csv')
    os.system('cp /home/PycharmProjects/untitled/filled/screening.pdf /home/PycharmProjects/untitled/screening/screening.pdf')

您需要确保在系统上安装了pdfforms。我正在为我的项目使用Pycharm，所以我只是通过该程序安装它。 PDFForms将“检查”你的pdf（确保它是一个可填写的表格）并在你的项目目录中创建一个“test”文件夹。看看你的“测试”pdf，找出每个字段的标签。在随机角色生成器中创建csv文件时，您需要这些数字（查看打印行并将其与上面的csv示例进行比较）。

下一个命令将使用命令行中的csv文档填写pdf表单。就我而言，它是“roles.csv”。这将创建一个“已填充”的目录，并填写PDF以供您使用。

然后我决定将填写的pdf复制到另一个名为“screening”的目录，这是第三个命令。

我使用以下代码以单独的名称保存每个文件：

def save_file():
    path = "/home/PycharmProjects/untitled/screening/screening.pdf/"
    newPath = "/home/PycharmProjects/untitled/screening"
    i = 1
    for root, dirs, files in os.walk(path):

        for name in files:
            base, extension = os.path.splitext(name)
            if not os.path.exists(os.path.join(newPath, base + extension)):
                oldfile = os.path.join(os.path.abspath(root), name)
                newfile = os.path.join(newPath, base + extension)
                os.rename(oldfile, newfile)
            else:
                oldfile = os.path.join(os.path.abspath(root), name)
                newfile = os.path.join(newPath, base + '_' + str(i) + extension)
                i += 1
                os.rename(oldfile, newfile)

save_file()

这部分还有一些问题。它每次都重命名目录中的所有文件;但我仍然可以生成500个随机文件。我希望根据信息所涉及的人标记文件，但我无法弄明白。我从Incrementing number in file name when file exists得到了这个部分。我没有其他解决方案的参考...抱歉。

在最后一部分，我添加了一个带有时间延迟的循环。我不知道为什么，但添加时间延迟使程序工作更顺畅而不会崩溃。可能因为它处理得那么多，它给了系统时间赶上：

if __name__ == '__main__':
    for i in range(10):
        role_creator()
        time.sleep(.5)
        rolegenerator()
        time.sleep(.5)
        save_file()
        time.sleep(.5)

同样，它可能不是最好的解决方案，但它的工作原理。

以上是关于在python中生成多个PDF报告的主要内容，如果未能解决你的问题，请参考以下文章