如何在大文本文件中拆分组合列表？

Posted 2023-03-11

技术标签:

【中文标题】如何在大文本文件中拆分组合列表？【英文标题】：How do I split a combo list in a large text file? 【发布时间】：2019-05-02 02:15:44 【问题描述】：

我的问题是我有一个非常大的电子邮件和密码数据库，我需要将它发送到一个 mysql 数据库。

.txt 文件格式是这样的：

emailnumberone@gmail.com:password1
emailnumbertwo@gmail.com:password2
emailnumberthree@gmail.com:password3
emailnumberfour@gmail.com:password4
emailnumberfive@gmail.com:password5

我的想法是创建一个循环，将行作为变量，搜索“：”并选择之前的文本，将其发送到数据库，然后与行的后面部分相同。我该怎么做？

【问题讨论】：

您的问题是什么？分裂？数据库访问？贮存？文件读取？分割@PatrickArtner 【参考方案1】：

带有一些错误处理的短程序：

创建演示数据文件：

t = """
emailnumberone@gmail.com:password1
emailnumbertwo@gmail.com:password2
emailnumberthree@gmail.com:password3
emailnumberfour@gmail.com:password4
emailnumberfive@gmail.com:password5
k
: """

with open("f.txt","w") as f: f.write(t)

解析数据/存储：

def store_in_db(email,pw):
    # replace with db access code 
    # see    http://bobby-tables.com/python
    # for parametrized db code in python (or the API of your choice)
    print("stored: ", email, pw)


with open("f.txt") as r:
    for line in r:
        if line.strip():  # weed out empty lines
            try:
                email, pw = line.split(":",1) # even if : in pw: only split at 1st :
                if email.strip() and pw.strip(): # only if both filled
                    store_in_db(email,pw)
                else:
                    raise ValueError("Something is empty: '"+line+"'")

            except Exception as ex:
                print("Error: ", line, ex)

输出：

stored:  emailnumberone@gmail.com password1

stored:  emailnumbertwo@gmail.com password2

stored:  emailnumberthree@gmail.com password3

stored:  emailnumberfour@gmail.com password4

stored:  emailnumberfive@gmail.com password5

Error:  k
 not enough values to unpack (expected 2, got 1)
Error:  :  Something is empty: ': '

编辑：根据What characters are allowed in an email address? - 如果引用，':' 可能是电子邮件第一部分的一部分。

理论上这将允许输入为

`"Cool:Emailadress@google.com:coolish_password"`

此代码会出错。请参阅Talip Tolga Sans answer 了解如何以不同方式分解拆分以避免此问题。

【讨论】：

【参考方案2】：

这可以通过python中字符串的简单split()方法来完成。

>>> a = 'emailnumberone@gmail.com:password1'
>>> b = a.split(':')
>>> b
['emailnumberone@gmail.com', 'password1']

为了适应@PatrickArtner 的复杂密码失败，可以这样做：

atLocation = a.find('@')
realSeperator = atLocation + a[atLocation:].find(':')
emailName = a[0:atLocation]
emailDomain = a[atLocation:realSeperator]
email = emailName + emailDomain
password = a[realSeperator + 1:]

print(email, password)

>>> emailnumberone@gmail.com com:plex:PassWord:fail

str.find() 返回给定字符在给定字符串中的第一个出现位置。电子邮件的名称字段中可以包含:，但不能包含@。因此，首先定位 @ 然后定位 : 将为您提供正确的分离位置。之后，拆分字符串将是小菜一碟。

【讨论】：

'emailnumberone@gmail.com:com:plex:PassWord:fail' b = a.split(':',1) ? docs.python.org/3/library/stdtypes.html#str.split - 我更担心密码中包含: .. 我不确定电子邮件是否真的可以包含: - 如果引用它们似乎可以：***.com/questions/2049502/… 来自wiki page 它是关于本地部分的，所以我首先确保'：'不在本地部分“空格和”（），：；@ []字符是允许的有限制（它们只允许在带引号的字符串中，如下段所述，此外，反斜杠或双引号必须以反斜杠开头）；“但是你的解决方案比我的更优雅但简单的 split(":",1) 不太准确 - 即使引用，它也不允许在本地部分使用任何“:” - 你的得到了我的支持【参考方案3】：

作为上下文管理器打开文件（使用 open(...)），您可以使用 for 循环遍历行，然后使用正则表达式匹配（重新模块）（或仅拆分“：”）并使用 sqlite3 插入你对 DB 的价值观。

所以文件：

with open("file.txt", "r") as f:
    for line in f:
        pass #manipulation

Sqlite3 文档：https://docs.python.org/2/library/sqlite3.html

【讨论】：

以上是关于如何在大文本文件中拆分组合列表？的主要内容，如果未能解决你的问题，请参考以下文章