pandas：根据另一列中的值获取与相应索引的确切对应值

Posted 2023-03-12

技术标签:

【中文标题】pandas：根据另一列中的值获取与相应索引的确切对应值【英文标题】：pandas: get the exact corresponding value with the corresponding index based on a value in another column 【发布时间】：2021-09-15 00:35:31 【问题描述】：

我有一列字符串（句子）和一列逗号分隔的字符串列表，如下所示：

df = pd.DataFrame( 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':[['DET', 'NOUN', 'VERB','ADJ', 'ADV'],['QUA', 'VERB', 'PRON', 'ADV'], ['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]])

我想以某种方式比较列，并创建第三列，如果“pos”列包含值“ADJ”，我会在“text”列中找到它的对应值（在本例中第一行我有'nice'）并以字典的形式返回它的索引。所以这就是第三列的样子；

third_column:

1 'nice' : 3
2 
3 'beautiful':1, 'nice':6

到目前为止，我已经尝试了以下方法：

df['Third_column']= ' '
df['liststring'] = [' '.join(map(str, l)) for l in df['pos']]
df.loc[df['liststring'].str.contains('ADJ'),'text']

但不知道如何继续获取确切的单词和索引

【问题讨论】：

【参考方案1】：

我会按照以下方式做一些事情：

将单词和 POS 标签放入单个（同步）列中：

df['text'] = df.text.str.split()
df = df.apply(pd.Series.explode)

      text   pos
0      the   DET
0  weather  NOUN
0       is  VERB
0     nice   ADJ
0   though   ADV

（注意：将列表、字典和其他序列作为单元格主要是您需要重组数据的迹象。）

重置索引，将原始索引保持为“sent_id”并将句子索引添加到标记：

df['sent_id'] = df.index
df = df.reset_index(drop=True)
df['tok_id'] = df.groupby('sent_id').cumcount()

      text   pos  sent_id  tok_id
0      the   DET        0       0
1  weather  NOUN        0       1
2       is  VERB        0       2
3     nice   ADJ        0       3
4   though   ADV        0       4
5      How   QUA        1       0
6      are  VERB        1       1
7      you  PRON        1       2

最后，得到所有的'ADJ'-rows

df[df.pos.eq('ADJ')]

         text  pos  sent_id  tok_id
3        nice  ADJ        0       3
10  beautiful  ADJ        2       1
14       nice  ADJ        2       5

【讨论】：

【参考方案2】：

您所描述的正是pandas.DataFrame.apply 所做的。

如果你想根据pandas中的其他列/行计算另一列/行，应该考虑这种方法。

import pandas as pd


def extract_words(row):
    word_pos = 
    text_splited = row.text.split()
    for i, p in enumerate(row.pos):
        if p == 'ADJ':
            word_pos[text_splited[i]] = i
    return word_pos


df = ...
df['Third_column'] = df.apply(extract_words, axis=1)

【讨论】：

以上是关于pandas：根据另一列中的值获取与相应索引的确切对应值的主要内容，如果未能解决你的问题，请参考以下文章