Python:函数返回的值未在熊猫数据框中更新

Posted

技术标签:

【中文标题】Python:函数返回的值未在熊猫数据框中更新【英文标题】:Python: Value returned by function not getting updated in pandas dataframe 【发布时间】:2021-06-06 12:47:21 【问题描述】:

我有一个带有列的fruits 数据框:(Name, Color) 和一个带有列的sentence 数据框:(Sentence)

水果数据框

          Name   Color
0        Apple     Red
1        Mango  Yellow
2       Grapes   Green
3   Strawberry    Pink

句子数据框

                      Sentence
0  I like Apple, Mango, Grapes
1            I like ripe Mango
2             Grapes are juicy
3           Oranges are citric

我需要将水果数据帧的每一行与句子数据帧的每一行进行比较,如果水果名称在句子中完全如此,请将其颜色连接到句子中水果名称之前。

这是我使用dataframe.apply()所做的:

import pandas as pd
import regex as re

# create fruit dataframe 
fruit_data = [['Apple', 'Red'], ['Mango', 'Yellow'], ['Grapes', 'Green']] 
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color']) 
print(fruit_df)

# create sentence dataframe 
sentence = ['I like Apple, Mango, Grapes', 'I like ripe Mango', 'Grapes are juicy'] 
sentence_df = pd.DataFrame(sentence, columns = ['Sentence']) 
print(sentence_df)


def search(desc, name, color):

    if re.findall(r"\b" + name + r"\b", desc):
             
            # for loop is used because fruit can appear more than once in sentence
            all_indexes = []
            for match in re.finditer(r"\b" + name + r"\b", desc):
                     all_indexes.append(match.start())
            
            arr = list(desc)
            for idx in sorted(all_indexes, reverse=True):
                       arr.insert(idx, color + " ")

            new_desc = ''.join(arr)
            return new_desc 

def compare(name, color):
    sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
    

fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])

我得到的结果是:

                      Sentence     Result
0  I like Apple, Mango, Grapes       None
1            I like ripe Mango       None
2             Grapes are juicy       None
3           Oranges are citric       None

预期结果:

                      Sentence                                        Result
0  I like Apple, Mango, Grapes  I like Red Apple, Yellow Mango, Green Grapes
1            I like ripe Mango                      I like ripe Yellow Mango
2             Grapes are juicy                        Green Grapes are juicy
3           Oranges are citric       

我也尝试使用itertuples() 遍历fruits_df,但结果仍然相同

for row in fruit_df.itertuples():
   result = sentence_df['Sentence'].apply(lambda x: search(x, getattr(row, 'Name'), getattr(row, 'Color')))
   print(result)

我不明白为什么search 函数返回的值没有存储在新列中。这是正确的做法还是我错过了什么?

【问题讨论】:

【参考方案1】:

问题是您为Fruit 的每一行调用compare,但每次传递都使用相同的输入。

我刚刚在compare 函数中添加了一些调试打印以了解发生了什么:

def compare(name, color):
    print(name, color)
    sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
    print(sentence_df['Result'])

得到:

Apple Red
0    I like Red Apple, Mango, Grapes
1                               None
2                               None
Name: Result, dtype: object
Mango Yellow
0    I like Apple, Yellow Mango, Grapes
1              I like ripe Yellow Mango
2                                  None
Name: Result, dtype: object
Grapes Green
0    I like Apple, Mango, Green Grapes
1                                 None
2               Green Grapes are juicy
Name: Result, dtype: object

因此,当水果存在时您成功添加颜色,但在不存在时返回 None,并且每次通过时从原始列开始,因此只保留最后一个。

如何解决:

    首先在搜索中添加一个缺少的return desc,以避免出现None 结果

     def search(desc, name, color):
    
         if re.findall(r"\b" + name + r"\b", desc):
                 ...                 
                 new_desc = ''.join(arr)
                 return new_desc
         return desc
    

    在应用比较之前初始化df['Result'],并将其用作输入:

     def compare(name, color):
         sentence_df['Result'] = sentence_df['Result'].apply(lambda x: search(x, name, color))
    
     sentence_df['Result'] = sentence_df['Sentence']
     fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
    

最终达到预期:

The final result is: 
0    I like Red Apple, Yellow Mango, Green Grapes
1                        I like ripe Yellow Mango
2                          Green Grapes are juicy
Name: Result, dtype: object

【讨论】:

很好的解释! 感谢您的解决方案!初始化结果列就可以了。【参考方案2】:

我们可以在fruits 数据框的帮助下创建一个mapping 系列,然后使用这个mapping 系列和Series.replace 替换出现在Sentence 列中的水果名称与 mapping 系列中的相应替换 (Color + Fruit name):

fruit = r'\b' + fruits['Name'] + r'\b'
fruit_replacement = list(fruits['Color'] + ' ' + fruits['Name'])

mapping = pd.Series(fruit_replacement, index=fruit)
sentence['Result'] = sentence['Sentence'].replace(mapping, regex=True)

>>> sentence
                      Sentence                                        Result
0  I like Apple, Mango, Grapes  I like Red Apple, Yellow Mango, Green Grapes
1            I like ripe Mango                      I like ripe Yellow Mango
2             Grapes are juicy                        Green Grapes are juicy
3           Oranges are citric                            Oranges are citric

【讨论】:

感谢您的解决方案!这种方法比我目前的方法耗时更少。 @Animeartist 编码快乐!【参考方案3】:

创建地图字典,然后替换。

尝试:

di = fr: f"co fr" for fr, co in fruit_df.values
res = sentence_df.replace(di, regex=True)

分辨率:

    Sentence
0   I like Red Apple, Yellow Mango, Green Grapes
1   I like ripe Yellow Mango
2   Green Grapes are juicy

【讨论】:

感谢您的解决方案。

以上是关于Python:函数返回的值未在熊猫数据框中更新的主要内容,如果未能解决你的问题,请参考以下文章

“运行时检查失败 #0 - ESP 的值未在函数调用中正确保存”从 C++ 代码成功 C# 回调后

从C ++代码成功进行C#回调后,“运行时检查失败#0 - ESP的值未在函数调用中正确保存”

熊猫应用函数将多个值返回到熊猫数据框中的行

文本字段更改的值未在 OnSubmit 中更新 - React-Hook-Form 和 React Js

RegSetValueEx返回成功,但是注册表值未在regedit中更新

在 Angular2 ngModel 值未在自定义指令的 onBlur 事件上更新