TypeError：列表中列表的预期字符串或类似字节的对象

Posted 2023-03-11

技术标签:

【中文标题】TypeError：列表中列表的预期字符串或类似字节的对象【英文标题】：TypeError: expected string or bytes-like object for list in list 【发布时间】：2022-01-21 12:36:57 【问题描述】：

我在数据框的列中有一个列表：

emaildf['email'][0] = ["abc@gmail.com","abc@yahoo.com","abc@abc.com"]

我想遍历每一行（假设 i）并匹配 i 中的对象（假设 j）是否包含子字符串：例如：

for i in emaildf['email']:
    for j in i:
         do_something:

这是我的代码：

Private_Email = []
for index,row in emaildf.iterrows():
    for i in row['email']:
        if len(re.findall("gmail|hotmail|yahoo|msn", row['email'])) > 0:
            Private_Email.append(row['email'])
        else:
            Private_Email.append('No Gmail/Hotmail/MSN/Yahoo domains found.')
emaildf['Private_Email'] = Private_Email

这是我得到的错误：

----> 4 if len(re.findall("gmail|hotmail|yahoo|msn", row['email'])) > 0: TypeError: expected string or bytes-like object

注意：输入：

re.findall("gmail|hotmail|yahoo|msn", "abc@gmail.com")

输出：

['gmail']

这就是我检查列表长度的原因。

【问题讨论】：

您的意思可能是for j in i 不是for j in in emaildf['email'][i]，因为i 是数据，而不是列表索引。是的，谢谢，改了。代码中的哪一行现在为您提供TypeError？第 4 行 - 我已经编辑了问题不知道为什么将这个问题关闭为“需要调试详细信息”，它们位于问题的顶部。请注意，您应该避免在数据帧上使用 re.findall，因为您不使用矢量化 Pandas 函数。 【参考方案1】：

你得到 TypeError：

----> 4         if len(re.findall("gmail|hotmail|yahoo|msn", row['email'])) > 0:

这里是因为row['email'] 是一个列表，而不是一个字符串，所以你不能应用re.findall 它需要一个字符串，而不是一个列表。

现在，您的特定问题似乎可以在不遍历数据帧行的情况下得到解决。试试：

emails = emaildf['email'].explode()
emails = pd.Series(np.where(emails.str.contains("gmail|hotmail|yahoo|msn").replace(np.nan, False), emails, np.nan), index=emails.index)
emails = emails.groupby(emails.index).apply(lambda x: [y for y in x if pd.notna(y)]).apply(lambda x: x if len(x)>1 else (x[0] if len(x)==1 else np.nan))
df['Private_Email'] = np.where(pd.notna(emails), emails, 'No Gmail/Hotmail/MSN/Yahoo domains found.')

【讨论】：

谢谢，我以前试过，对于这一行 - emails[emails.str.contains("gmail|hotmail|yahoo|msn")].groupby(emaildf.index).agg （列表）我收到此错误 - ValueError: Cannot mask with non-boolean array contains NA / NaN values 出现此错误：ValueError: cannot reindex from a duplicate axis. 非常感谢您的帮助我的数据是这样的：我的数据框名称中有一个列：“email”，并且在该列的每一行中都有一个包含两个或多个电子邮件的列表，我想检查是否该列表中的每个对象都包含“gmail|yahoo|msn|hotmail”。是的，从我的 df 中的 1703 行开始 - 我有 9 行，该行中有空列表。【参考方案2】：

如果您只是想确保字符串中存在子字符串，则不需要使用正则表达式。

您可以将搜索关键字保留在一个列表中，并使用.apply 遍历email 列内列表中的每个值，并从关键字列表中过滤掉任何不包含任何关键字的值。

查看此 Python 代码：

import pandas as pd
emaildf = pd.DataFrame('email':[["abc@gmail.com","abc@yahoo.com","abc@abc.com"]])
keywords = ["gmail", "hotmail", "yahoo", "msn"]
emaildf['Private_Email'] = emaildf['email'].apply(lambda row: [x for x in row if any(key in x for key in keywords)])
# => >>> emaildf['Private_Email']
#    0    [abc@gmail.com, abc@yahoo.com]
#    Name: Private_Email, dtype: object

.apply(lambda row: [x for x in row if any(key in x for key in keywords)]) 部分迭代 row 值中的每个字符串 (x) 并保留任何 x 字符串（如果其中存在任何 key）。

【讨论】：

以上是关于TypeError：列表中列表的预期字符串或类似字节的对象的主要内容，如果未能解决你的问题，请参考以下文章