Python正则表达式，在地址中查找电子邮件域

Posted 2023-02-22

技术标签:

【中文标题】Python正则表达式，在地址中查找电子邮件域【英文标题】：Python Regular Expressions, find Email Domain in Address 【发布时间】：2011-08-03 13:08:54 【问题描述】：

我知道我是个白痴，但我无法从这个电子邮件地址中提取域：

'blahblah@gmail.com'

我想要的输出：

'@gmail.com'

我目前的输出：

（只是句号）

这是我的代码：

import re
test_string = 'blahblah@gmail.com'
domain = re.search('@*?\.', test_string)
print domain.group()

这就是我认为我的正则表达式所说的 ('@*?.', test_string)：

 ' # begin to define the pattern I'm looking for (also tell python this is a string)

  @ # find all patterns beginning with the at symbol ("@")

  * # find all characters after ampersand

  ? # find the last character before the period

  \ # breakout (don't use the next character as a wild card, us it is a string character)

  . # find the "." character

  ' # end definition of the pattern I'm looking for (also tell python this is a string)

  , test string # run the preceding search on the variable "test_string," i.e., 'blahblah@gmail.com'

我基于这里的定义：

http://docs.activestate.com/komodo/4.4/regex-intro.html

另外，我搜索过，但其他答案对我来说有点难以理解。

像往常一样，非常感谢您的帮助。谢谢。

我的东西如果重要的话：

Windows 7 专业版（64 位）

Python 2.6（64 位）

PS。 *** 问题：我的帖子不包含新行，除非我在它们之间点击“返回”两次。例如（当我发帖时，这些都在不同的行）：

@ - 查找以 at 符号 ("@") 开头的所有模式 * - 查找 & 后面的所有字符 ? - 找到句号之前的最后一个字符 \ - 突破（不要使用下一个字符作为通配符，我们它是一个字符串字符） . - 找出 ”。”特点 , 测试字符串 - 对变量“test_string”运行前面的搜索，即“blahblah@gmail.com”

这就是为什么我在上面的每一行都有一个空白行。我究竟做错了什么？谢谢。

【问题讨论】：

回答你的 PS（应该在 Meta 上）：Stack Overflow 使用 Markdown。从格式说明：“换行符在末尾添加 2 个空格” 它会接受html如一个简单的解决方案是“@.*”，虽然它可能对你的口味来说太贪心了。 【参考方案1】：

好的，那为什么不使用拆分呢？（或分区）

"@"+'blahblah@gmail.com'.split("@")[-1]

或者您可以使用其他字符串方法，例如 find

>>> s="bal@gmail.com"
>>> s[ s.find("@") : ]
'@gmail.com'
>>>

如果您要从其他文本中提取电子邮件地址

f=open("file")
for line in f:
    words= line.split()
    if "@" in words:
       print "@"+words.split("@")[-1]
f.close()

【讨论】：

感谢您的回复。为什么是正则表达式而不是常规字符串方法？我有 40 兆的字符串，其中包含我试图提取的垃圾文本的电子邮件地址。我是一名业余程序员，我尽量保持简单并使用正则表达式以便我能理解它，所以我没有在这里深入探讨。抱歉，如果这令人困惑。【参考方案2】：

使用正则表达式：

>>> re.search('@.*', test_string).group()
'@gmail.com'

另一种方式：

>>> '@' + test_string.split('@')[1]
'@gmail.com'

【讨论】：

啊。我知道我需要另一个'。谢谢！！（不知道为什么）。 @AquaT33nFan: "@*" 表示"@" 出现0 次或多次。 "@.*" 表示出现一次 "@" 后跟出现 0 次或多次出现任何字符（换行符除外）。换句话说，这里的* 是Kleene star，而不是wildcard。【参考方案3】：

这里有一些我认为可能会有所帮助的东西

import re
s = 'My name is Conrad, and blahblah@gmail.com is my email.'
domain = re.search("@[\w.]+", s)
print domain.group()

输出

@gmail.com

正则表达式的工作原理：

@ - 扫描直到你看到这个字符

[\w.] 一组可能匹配的字符，所以\w 都是字母数字字符，而结尾句点. 添加到该组字符中。

+ 上一组中的一个或多个。

因为此正则表达式匹配句点字符和@ 之后的每个字母数字，所以即使在句子中间，它也会匹配电子邮件域。

【讨论】：

【参考方案4】：

只是想指出 chrisaycock 的方法会匹配表单的无效电子邮件地址

herp@

要正确确保您只是将可能有效的电子邮件与域匹配，您需要稍微更改它

使用正则表达式：

>>> re.search('@.+', test_string).group()
'@gmail.com'

【讨论】：

【参考方案5】：

使用下面的正则表达式，您可以提取任何域，例如 .com 或 .in。

import re
s = 'my first email is user1@gmail.com second email is enter code hereuser2@yahoo.in and third email is user3@outlook.com'
print(re.findall('@+\S+[.in|.com|]',s))

输出

['@gmail.com', '@yahoo.in']

【讨论】：

【参考方案6】：

这是使用索引函数的另一种方法：

email_addr = 'blahblah@gmail.com'

# Find the location of @ sign
index = email_addr.index("@")

# extract the domain portion starting from the index
email_domain = email_addr[index:]

print(email_domain)
#------------------
# Output:
@gmail.com

【讨论】：

【参考方案7】：

你可以尝试使用 urllib

from urllib import parse
email = 'myemail@mydomain.com'
domain = parse.splituser(email)[1]

输出将是

'mydomain.com'

【讨论】：

splituser 函数已弃用。 bugs.python.org/issue35891

以上是关于Python正则表达式，在地址中查找电子邮件域的主要内容，如果未能解决你的问题，请参考以下文章