Python:以pdf格式下载标签(gmail)中的所有电子邮件

Posted

技术标签:

【中文标题】Python:以pdf格式下载标签(gmail)中的所有电子邮件【英文标题】:Python : download as pdf all emails from a label (gmail) 【发布时间】:2019-08-03 10:47:24 【问题描述】:

我想从 gmail 下载 100 多封 pdf 格式的电子邮件。通过gmail中的打印选项手动下载它们太长了。

此 python 脚本检索所选标签中的电子邮件。如何将此电子邮件转换为 pdf。

# source  = https://developers.google.com/gmail/api/quickstart/python?authuser=2

from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request



SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

def main():
    creds = None

    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server()
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('gmail', 'v1', credentials=creds)

    # Call the Gmail API 

    response= service.users().messages().list(userId="me", labelIds="Label_53", q=None, pageToken=None, maxResults=None, includeSpamTrash=None).execute()
    all_message_in_label = []
    if 'messages' in response:
        all_message_in_label.extend(response['messages'])

    while 'nextPageToken' in response:
      page_token = response['nextPageToken']
      response = service.users().messages().list(userId="me", labelIds="Label_53", q=None, pageToken=page_token, maxResults=None, includeSpamTrash=None).execute()
      all_message_in_label.extend(response['messages'])


    if not all_message_in_label:
        print('No email LM found.')
    else:
        # get message from Id listed in all_message_in_label
        for emails in all_message_in_label: 
            message= service.users().messages().get(userId="me", id=emails["id"], format="raw", metadataHeaders=None).execute()



if __name__ == '__main__':
    main()

【问题讨论】:

【参考方案1】:

我对您的问题进行了一些挖掘,发现了一些可能有用的链接:

关于将您的消息转换为.eml 格式this 链接。

关于从 .eml 转换为 .pdf 这些链接:

eml2pdf 是一个 python github 项目,它将 eml 文件转换为 pdf 但我不确定它是否工作。你可以看看它是否有效。

eml-to-pdf 是另一个 github 项目,它看起来不太好用。它是用 javascript 编写的。

还有pyPdf 可以用来生成pdf 文件。尽管有了这个,您可能需要自己转换电子邮件并格式化它们。

有关消息对象格式的更多信息,您可以参考 gmail api python 文档get 方法。

here 是一篇博客文章,它使用不同的方法完成了您正在寻找的工作,尽管我不完全确定它是否仍然有效。

我希望它有所帮助。祝你好运。

【讨论】:

感谢您向我展示了正确的轨道!是的,gmail-to-pdf(博客文章)仍然有效,但不适用于所有电子邮件(参见 github 上的问题),不适用于嵌套标签,仅适用于加星标的电子邮件(最后两个缺点只需要简单的修复)。 我很高兴它有帮助。 @MagTun【参考方案2】:

我尝试了 Ali Nuri Seker 回答中的建议,但没有奏效: - eml2pdf:不适用于 Windows - eml-to-pdf:Mime 类型错误 - pyPdf:构建整个设计需要做的工作太多 - gmail-to-pdf:一些电子邮件的代码中的错误(参见 github 上提到的错误)

什么有效(与 Ali Nuri Seker 相同的总体思路):

    使用email.generator.Generator将电子邮件另存为.eml 使用eml-to-pdf-converter(不是基于 python,而是开源 GUI)将 .eml 文件转换为 pdf(您基本上只需拖放包含 .eml 文件的文件夹,单击一个按钮,即可获得 pdf。它甚至适用于子文件夹!)

更详细的脚本可以找到here


这是第一个脚本“将电子邮件另存为 .eml”:

# source  = https://developers.google.com/gmail/api/quickstart/python?authuser=2

# In brief:
# this script will connect to your gmail account and download as .eml file your email from a specified label. 
# then you can convert the .eml files to pdf :  https://github.com/nickrussler/eml-to-pdf-converter

# set up
#  1) save this script in a folder
#  2) save the script "get labels id.py" in the same folder
#  3) go to this link https://developers.google.com/gmail/api/quickstart/python and click on "Enable the gmail API", then click on "Download client configuration" and save this .json file in the same folder as this script
#  4) GMAIL API doesn't use Label name but ID so you need to run the script "get labels id.py" and to copy the ID for the label you need (the firt time, you will be ask the persmission on  a consent screen, accept it with the account where you want to download your email)  
#  5) copy your label id below in custom var 
#  6) run this script and your emails will be saved as .eml file in a subfolder "emails as eml"

# connect to gmail api 
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

# decode response from Gmail api and save a email
import base64
import email

#for working dir and path for saving file
import os

# CUSTOM VAR 
labelid = "Label_18"  # change your label id

# set working directory  https://***.com/a/1432949/3154274
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)
print("working dir set to ", dname)

# create folder to save email 
emailfolder= dname+"\emails as eml"
if not os.path.exists(emailfolder):
    os.makedirs(emailfolder)


# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

def main():

    # create the credential the first tim and save then in token.pickle
    creds = None
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server()
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    #create the service 
    service = build('gmail', 'v1', credentials=creds)


    # get the *list* of all emails in the labels (if there are multiple pages, navigate to them)
    #*************************************
    #  ressources for *list* email by labels
    # https://developers.google.com/resources/api-libraries/documentation/gmail/v1/python/latest/index.html 
    # https://developers.google.com/resources/api-libraries/documentation/gmail/v1/python/latest/gmail_v1.users.messages.html#list
    # example of code for list: https://developers.google.com/gmail/api/v1/reference/users/messages/list?apix_params=%7B%22userId%22%3A%22me%22%2C%22includeSpamTrash%22%3Afalse%2C%22labelIds%22%3A%5B%22LM%22%5D%7D
    #*************************************

    response= service.users().messages().list(userId="me", labelIds=labelid, q=None, pageToken=None, maxResults=None, includeSpamTrash=None).execute()
    all_message_in_label = []
    if 'messages' in response:
        all_message_in_label.extend(response['messages'])

    while 'nextPageToken' in response:
      page_token = response['nextPageToken']
      response = service.users().messages().list(userId="me", labelIds=labelid, q=None, pageToken=page_token, maxResults=None, includeSpamTrash=None).execute()
      all_message_in_label.extend(response['messages'])


    # all_message_in_label looks like this 
            # for email in all_message_in_label:
                # print(email)
                #'id': '169735e289ba7310', 'threadId': '169735e289ba7310'
                #'id': '169735c76a4b93af', 'threadId': '169735c76a4b93af'    
    if not all_message_in_label:
        print('No email LM found.')
    else:
        # for each ID in all_message_in_label we *get* the message 

        #*************************************
        # ressources for *get* email 
        # https://developers.google.com/resources/api-libraries/documentation/gmail/v1/python/latest/gmail_v1.users.messages.html#get
        # code example for decode https://developers.google.com/gmail/api/v1/reference/users/messages/get 
        #  + decode for python 3 https://python-forum.io/Thread-TypeError-initial-value-must-be-str-or-None-not-bytes--12161
        #*************************************

        for emails in all_message_in_label: 
            message= service.users().messages().get(userId="me", id=emails["id"], format="raw", metadataHeaders=None).execute()
            msg_str = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))

            try: 
                mime_msg = email.message_from_string(msg_str.decode())  

                # the the message as a .eml file 
                outfile_name = os.path.join(emailfolder, f'emails["id"].eml')

                with open(outfile_name, 'w') as outfile:
                    gen = email.generator.Generator(outfile)
                    gen.flatten(mime_msg)
                print("mail saved: ", emails["id"])

            except:
                print("error in message ", message["snippet"])

if __name__ == '__main__':
    main()

这是第二个脚本“获取标签 ids.py”(参见第一个脚本中的“设置”步骤 4)

# source  = https://developers.google.com/gmail/api/quickstart/python?authuser=2

from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']


def main():
    creds = None
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server()
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('gmail', 'v1', credentials=creds)

    # Get list of all labels
    #  https://developers.google.com/resources/api-libraries/documentation/gmail/v1/python/latest/index.html
    results = service.users().labels().list(userId='me').execute()
    labels = results.get('labels', [])

    if not labels:
        print('No labels found.')
    else:
        print('Labels:')
    for label in labels:
        print(label['name'] + " "+label['id'])


if __name__ == '__main__':
    main()
    input("continue")

【讨论】:

以上是关于Python:以pdf格式下载标签(gmail)中的所有电子邮件的主要内容,如果未能解决你的问题,请参考以下文章

电脑网页显示Adobe PDF Document啥意思?

如何仅从特定 gmail 标签下载未读附件?

使用 gmail API 阅读独特的邮件

《OpenCV3计算机视觉-python语言实现(第二版)》高清带标签PDF下载学习

《Python编程从入门到实践》高清中文版带标签可复制PDF学习下载

如何通过 gmail-api for python 发送 HTML 格式的电子邮件