python 用于从文本文件中提取电子邮件地址的python脚本。您可以将其传递给多个文件。它将电子邮件地址打印到stdout,on

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 用于从文本文件中提取电子邮件地址的python脚本。您可以将其传递给多个文件。它将电子邮件地址打印到stdout,on相关的知识,希望对你有一定的参考价值。

#!/usr/bin/env python
#
# Extracts email addresses from one or more plain text files.
#
# Notes:
# - Does not save to file (pipe the output to a file if you want it saved).
# - Does not check for duplicates (which can easily be done in the terminal).
#
# (c) 2013  Dennis Ideler <ideler.dennis@gmail.com>

from optparse import OptionParser
import os.path
import re

regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
                    "{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
                    "\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))

def file_to_str(filename):
    """Returns the contents of filename as a string."""
    with open(filename) as f:
        return f.read().lower() # Case is lowered to prevent regex mismatches.

def get_emails(s):
    """Returns an iterator of matched emails found in string s."""
    # Removing lines that start with '//' because the regular expression
    # mistakenly matches patterns like 'http://foo@bar.com' as '//foo@bar.com'.
    return (email[0] for email in re.findall(regex, s) if not email[0].startswith('//'))

if __name__ == '__main__':
    parser = OptionParser(usage="Usage: python %prog [FILE]...")
    # No options added yet. Add them here if you ever need them.
    options, args = parser.parse_args()

    if not args:
        parser.print_usage()
        exit(1)

    for arg in args:
        if os.path.isfile(arg):
            for email in get_emails(file_to_str(arg)):
                print email
        else:
            print '"{}" is not a file.'.format(arg)
            parser.print_usage()

以上是关于python 用于从文本文件中提取电子邮件地址的python脚本。您可以将其传递给多个文件。它将电子邮件地址打印到stdout,on的主要内容,如果未能解决你的问题,请参考以下文章

从Txt,PDf,Google云端硬盘中的Doc文件中提取电子邮件地址

csharp 从文本中提取所有电子邮件地址

如何从 Android 中的 Vision OCR 结果文本中提取姓名、电话号码和电子邮件地址?

从电子邮件文本中解析“发件人”地址

如何从 python 中的 RFC 2822 邮件标头中提取多个电子邮件地址?

从大型文档中提取电子邮件子字符串