遍历一列以查找每个位置第一次出现的特殊字符

Posted 2023-02-23

技术标签:

【中文标题】遍历一列以查找每个位置第一次出现的特殊字符【英文标题】：Loop through a column to find first occurrence of special character for each location 【发布时间】：2021-10-10 02:49:16 【问题描述】：

数据框中有两列

location experience
a1       tech
a2       loyalty
a2       ‡€asd
a5       Ù…Ù
a5       completed
a6       
a7       --
a8       happy
a8       best
a9       for sure
a9       notgood
b1       amazing:
b1       /§!vision
b5       referral

如何遍历位置，如果在第一行经验中识别出特殊字符，则删除所有位置。如果 Experience 中的第二行或其他行以特殊字符开头，我不必删除它。

示例 1： b1 太棒了：

b1 /§!视觉

这里第一行，经验值以字母开头，所以我不必删除任何位置值为b1的行

示例 2： a5 Ù…Ù

a5 完成

这里的第一个经验值以特殊字符开头，所以我必须删除所有位置值为 a5 的行

输出为

location experience
a1       tech
a2       loyalty
a2       ‡€asd
a6 
a8       happy
a8       best
a9       for sure
a9       notgood
b1       amazing:
b1       /§!vision
b5       referral

【问题讨论】：

【参考方案1】：

您可以使用 Pandas groupby 后跟 first 来获取第一行 experience。然后，使用str.match 检查experience 第一行值是否以字母开头。

import pandas as pd

df = pd.read_csv('sample.csv')
print(df)

location = df.groupby('location')

# get the first row of each group
loc_first = location.first()

# check if experience first row value starts with letter (or space?, 'a6' location)
exp_match = loc_first[~loc_first.experience.str.match(r'^[a-zA-Z\s]')]

# remove all rows with location
df = df[~df.location.isin(exp_match.index)]
print(df)

df 输出

   location experience
0        a1       tech
1        a2    loyalty
2        a2      ‡€asd
5        a6
7        a8      happy
8        a8       best
9        a9   for sure
10       a9    notgood
11       b1   amazing:
12       b1  /§!vision
13       b5   referral

【讨论】：

【参考方案2】：

试试这样：

import pandas as pd

df = pd.read_csv('test.csv', engine='python')
df.fillna(value='', inplace=True)
df[df.experience.str.isalnum()]

【讨论】：

以上是关于遍历一列以查找每个位置第一次出现的特殊字符的主要内容，如果未能解决你的问题，请参考以下文章