python imaplib 获取 gmail 收件箱主题标题和发件人姓名
Posted
技术标签:
【中文标题】python imaplib 获取 gmail 收件箱主题标题和发件人姓名【英文标题】:python imaplib to get gmail inbox subjects titles and sender name 【发布时间】:2011-11-11 00:16:52 【问题描述】:我正在使用 pythons imaplib 连接到我的 gmail 帐户。我想检索前 15 条消息(未读或已读,没关系),只显示主题和发件人姓名(或地址),但不知道如何显示收件箱的内容。
这是我目前的代码(连接成功)
import imaplib
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('mygmail@gmail.com', 'somecrazypassword')
mail.list()
mail.select('inbox')
#need to add some stuff in here
mail.logout()
我相信这应该足够简单,我只是对 imaplib 库的命令不够熟悉。任何帮助将不胜感激...
更新 感谢 Julian,我可以遍历每条消息并检索全部内容:
typ, data = mail.search(None, 'ALL')
for num in data[0].split():
typ, data = mail.fetch(num, '(RFC822)')
print 'Message %s\n%s\n' % (num, data[0][1])
mail.close()
但我只想要主题和发件人。这些项目是否有 imaplib 命令,或者我是否必须解析 data[0][1] 的全部内容以获取文本:主题和发件人?
更新 好的,让主题和发件人部分工作,但迭代 (1, 15) 是按 desc 顺序完成的,显然首先向我显示最旧的消息。我怎样才能改变这个?我试过这样做:
for i in range( len(data[0])-15, len(data[0]) ):
print data
但这只是给了我None
的所有 15 次迭代......有什么想法吗?我也尝试过mail.sort('REVERSE DATE', 'UTF-8', 'ALL')
,但 gmail 不支持 .sort() 函数
更新 想出了一个办法:
#....^other code is the same as above except need to import email module
mail.select('inbox')
typ, data = mail.search(None, 'ALL')
ids = data[0]
id_list = ids.split()
#get the most recent email id
latest_email_id = int( id_list[-1] )
#iterate through 15 messages in decending order starting with latest_email_id
#the '-1' dictates reverse looping order
for i in range( latest_email_id, latest_email_id-15, -1 ):
typ, data = mail.fetch( i, '(RFC822)' )
for response_part in data:
if isinstance(response_part, tuple):
msg = email.message_from_string(response_part[1])
varSubject = msg['subject']
varFrom = msg['from']
#remove the brackets around the sender email address
varFrom = varFrom.replace('<', '')
varFrom = varFrom.replace('>', '')
#add ellipsis (...) if subject length is greater than 35 characters
if len( varSubject ) > 35:
varSubject = varSubject[0:32] + '...'
print '[' + varFrom.split()[-1] + '] ' + varSubject
这为我提供了最新的 15 条消息主题和发件人地址,按要求按降序排列!感谢所有帮助过的人!
【问题讨论】:
Python 文档中的示例对我来说很好:docs.python.org/library/imaplib#imap4-example 是的,你是对的,它确实可以很好地检索所有消息的完整消息内容。我只想要主题和发件人地址。然后我可以使 for 循环从 1 到 15 还有另一个指向 Python 文档的链接:docs.python.org/library/email.html ;) 【参考方案1】:添加到以上所有答案。
import imaplib
import base64
import os
import email
if __name__ == '__main__':
email_user = "email@domain.com"
email_pass = "********"
mail = imaplib.IMAP4_SSL("hostname", 993)
mail.login(email_user, email_pass)
mail.select()
type, data = mail.search(None, 'ALL')
mail_ids = data[0].decode('utf-8')
id_list = mail_ids.split()
mail.select('INBOX', readonly=True)
for i in id_list:
typ, msg_data = mail.fetch(str(i), '(RFC822)')
for response_part in msg_data:
if isinstance(response_part, tuple):
msg = email.message_from_bytes(response_part[1])
print(msg['from']+"\t"+msg['subject'])
这将为您提供电子邮件的发件人和主题名称。
【讨论】:
【参考方案2】:BODY
获取几乎所有内容并将消息标记为已读。
BODY[<parts>]
得到了这些部分。
BODY.PEEK[<parts>]
获得相同的部分,但不将消息标记为已读。
<parts>
可以是 HEADER
或 TEXT
或 HEADER.FIELDS (<list of fields>)
或
HEADER.FIELDS.NOT (<list of fields>)
这是我使用的:typ, data = connection.fetch(message_num_s, b'(BODY.PEEK[HEADER.FIELDS (SUBJECT FROM)])')
`
def safe_encode(seq):
if seq not in (list,tuple):
seq = [seq]
for i in seq:
if isinstance(i, (int,float)):
yield str(i).encode()
elif isinstance(i, str):
yield i.encode()
elif isinstance(i, bytes):
yield i
else:
raise ValueError
def fetch_fields(connection, message_num, field_s):
"""Fetch just the fields we care about. Parse them into a dict"""
if isinstance(field_s, (list,tuple)):
field_s = b' '.join(safe_encode(field_s))
else:
field_s = tuple(safe_encode(field_s))[0]
message_num = tuple(safe_encode(message_num))[0]
typ, data = connection.fetch(message_num, b'(BODY.PEEK[HEADER.FIELDS (%s)])'%(field_s.upper()))
if typ != 'OK':
return typ, data #change this to an exception if you'd rather
items=
lastkey = None
for line in data[0][1].splitlines():
if b':' in line:
lastkey, value = line.strip().split(b':', 1)
lastkey = lastkey.capitalize()
#not all servers capitalize the same, and some just leave it
#as however it arrived from some other mail server.
items[lastkey]=value
else:
#subject was so long it ran onto the next line, luckily it didn't have a ':' in it so its easy to recognize.
items[lastkey]+=line
#print(items[lastkey])
return typ, items
`
您将其放入您的代码示例中:将“mail.fetch()”调用替换为fetch_fields(mail, i, 'SUBJECT FROM')
或fetch_fields(mail, i, ('SUBJECT' 'FROM'))
【讨论】:
所有其他答案获取整个消息 - 例如mail.fetch( i, '(RFC822)' )
- 既昂贵又缓慢。我相信这是唯一实际应用 IMAPv4rev1 的答案。【参考方案3】:
这是我从电子邮件中获取有用信息的解决方案:
import datetime
import email
import imaplib
import mailbox
EMAIL_ACCOUNT = "your@gmail.com"
PASSWORD = "your password"
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(EMAIL_ACCOUNT, PASSWORD)
mail.list()
mail.select('inbox')
result, data = mail.uid('search', None, "UNSEEN") # (ALL/UNSEEN)
i = len(data[0].split())
for x in range(i):
latest_email_uid = data[0].split()[x]
result, email_data = mail.uid('fetch', latest_email_uid, '(RFC822)')
# result, email_data = conn.store(num,'-FLAGS','\\Seen')
# this might work to set flag to seen, if it doesn't already
raw_email = email_data[0][1]
raw_email_string = raw_email.decode('utf-8')
email_message = email.message_from_string(raw_email_string)
# Header Details
date_tuple = email.utils.parsedate_tz(email_message['Date'])
if date_tuple:
local_date = datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
local_message_date = "%s" %(str(local_date.strftime("%a, %d %b %Y %H:%M:%S")))
email_from = str(email.header.make_header(email.header.decode_header(email_message['From'])))
email_to = str(email.header.make_header(email.header.decode_header(email_message['To'])))
subject = str(email.header.make_header(email.header.decode_header(email_message['Subject'])))
# Body details
for part in email_message.walk():
if part.get_content_type() == "text/plain":
body = part.get_payload(decode=True)
file_name = "email_" + str(x) + ".txt"
output_file = open(file_name, 'w')
output_file.write("From: %s\nTo: %s\nDate: %s\nSubject: %s\n\nBody: \n\n%s" %(email_from, email_to,local_message_date, subject, body.decode('utf-8')))
output_file.close()
else:
continue
【讨论】:
【参考方案4】:我正在寻找一个现成的简单脚本来通过 IMAP 列出最后一个收件箱,而无需对所有邮件进行排序。这里的信息很有用,虽然是 DIY 并且遗漏了一些方面。首先,IMAP4.select
返回消息计数。其次,主题标头解码并不简单。
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import imaplib
import email
from email.header import decode_header
import HTMLParser
# to unescape xml entities
_parser = HTMLParser.HTMLParser()
def decodeHeader(value):
if value.startswith('"=?'):
value = value.replace('"', '')
value, encoding = decode_header(value)[0]
if encoding:
value = value.decode(encoding)
return _parser.unescape(value)
def listLastInbox(top = 4):
mailbox = imaplib.IMAP4_SSL('imap.gmail.com')
mailbox.login('mygmail@gmail.com', 'somecrazypassword')
selected = mailbox.select('INBOX')
assert selected[0] == 'OK'
messageCount = int(selected[1][0])
for i in range(messageCount, messageCount - top, -1):
reponse = mailbox.fetch(str(i), '(RFC822)')[1]
for part in reponse:
if isinstance(part, tuple):
message = email.message_from_string(part[1])
yield h: decodeHeader(message[h]) for h in ('subject', 'from', 'date')
mailbox.logout()
if __name__ == '__main__':
for message in listLastInbox():
print '-' * 40
for h, v in message.items():
print u'0:8s: 1'.format(h.upper(), v)
【讨论】:
【参考方案5】:对于那些寻找如何检查邮件和解析邮件头的人,这是我使用的:
def parse_header(str_after, checkli_name, mailbox) :
#typ, data = m.search(None,'SENTON', str_after)
print mailbox
m.SELECT(mailbox)
date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y")
#date = (datetime.date.today().strftime("%d-%b-%Y"))
#date = "23-Jul-2012"
print date
result, data = m.uid('search', None, '(SENTON %s)' % date)
print data
doneli = []
for latest_email_uid in data[0].split():
print latest_email_uid
result, data = m.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = data[0][1]
import email
email_message = email.message_from_string(raw_email)
print email_message['To']
print email_message['Subject']
print email.utils.parseaddr(email_message['From'])
print email_message.items() # print all headers
【讨论】:
AttributeError: 'module' 对象没有属性 'message_from_string'。我正在导入电子邮件,保证。 @ChaseRoberts 你需要使用from email import email
。我猜你使用了import email
,这意味着你试图以错误的级别访问message_from_string
。【参考方案6】:
c.select('INBOX', readonly=True)
for i in range(1, 30):
typ, msg_data = c.fetch(str(i), '(RFC822)')
for response_part in msg_data:
if isinstance(response_part, tuple):
msg = email.message_from_string(response_part[1])
for header in [ 'subject', 'to', 'from' ]:
print '%-8s: %s' % (header.upper(), msg[header])
这应该让您了解如何检索主题以及从何处检索主题?
【讨论】:
什么是电子邮件?你指的是我的“邮件”变量吗? message_from_string() 是什么组成的?我收到一条错误消息,指出AttributeError("Unknown IMAP4 command: '%s'" % attr) AttributeError: Unknown IMAP4 command: 'message_from_string'
没关系,想通了,我没有包含电子邮件模块。谢谢
在少于 30 封电子邮件的情况下,代码不会引发异常吗?如果电子邮件 ID(在本例中为 str(i)
)不存在,c.fetch()
将触发异常。
@chutsu 很可能,您很可能只是从服务器收到一条消息,指出没有这样的 ID。取决于你的图书馆。
@chutsu 很高兴听到并且很高兴知道。然而,从经验来看,RFC 更像是一个指南,而不是实际的实现。 RFC 中有很多“应该……”和“可能……”,开发人员在生产东西时会说“哎呀,这没那么重要”。但无论如何,OP 应该努力遵循 RFC,只要记住其他人可能不会:)以上是关于python imaplib 获取 gmail 收件箱主题标题和发件人姓名的主要内容,如果未能解决你的问题,请参考以下文章
如何在Python中执行IMAP搜索(使用Gmail和imaplib)?
使用 Python 和 imaplib 在 GMail 中移动电子邮件