如何通过遍历行来预测数据框中的每一行？

Posted 2023-03-12

技术标签:

【中文标题】如何通过遍历行来预测数据框中的每一行？【英文标题】：how can I predict for each row in the dataframe by iterating through the rows? 【发布时间】：2021-07-24 19:57:15 【问题描述】：

我构建了一个 BERT 模型，现在我有了一个块，可以很好地对文本列中的每一行进行逐一分类。 Pandas 数据框是这样的：

    text
0   working add oil
1   @KristianaNKOTB you're welcome
2   is going to bed, work in the morning boo but t...
3   @sparky_habbo - uni &amp; assignments happened...
4   Can't wait to have chinese food! Still disappo...

文本列中每一行的分类代码如下：

text = [df[0]]

pred_tokens = map(tokenizer.tokenize, text)
pred_tokens = map(lambda tok: ["[CLS]"] + tok + ["[SEP]"], pred_tokens)
pred_token_ids = list(map(tokenizer.convert_tokens_to_ids, pred_tokens))

pred_token_ids = map(lambda tids: tids +[0]*(data.max_seq_len-len(tids)),pred_token_ids)
pred_token_ids = np.array(list(pred_token_ids))

predictions = model.predict(pred_token_ids).argmax(axis=-1)

df = pd.DataFrame(predictions, columns = ['emotion'])
df

例如，如果我们要分类df.text[0]，所以'working add oil'，是1还是0，我使用这段代码，结果是这样的：

    emotion
0   1

但是现在我

【问题讨论】：

【参考方案1】：

下面的代码演示了可用于预测数据框中的文本并保存它的过程。

输入数据：

df=pd.DataFrame("text":['working add oil',"@KristianaNKOTB you're welcome","is going to bed, work in the morning boo but t..."])

定义一个函数。你可以根据你的程序调整它。您可以注释我的代码并取消注释您的代码。

import random
def predict_emotion(input_text):
    text = input_text
    
    ''' uncomment this and remove my return statement
    pred_tokens = map(tokenizer.tokenize, text)
    pred_tokens = map(lambda tok: ["[CLS]"] + tok + ["[SEP]"], pred_tokens)
    pred_token_ids = list(map(tokenizer.convert_tokens_to_ids, pred_tokens))

    pred_token_ids = map(lambda tids: tids +[0]*(data.max_seq_len-len(tids)),pred_token_ids)
    pred_token_ids = np.array(list(pred_token_ids))

    predictions = model.predict(pred_token_ids).argmax(axis=-1)
    return predictions
    '''
    return_int=random.randint(1,8)
    print(f"text:input_text,emotion:return_int")
    return return_int

为每一行输入文本调用该函数。

df['emotion']=df['text'].apply(predict_emotion)

输出：

【讨论】：

emotion 必须为 1 或 0，这是一个分类任务。我仅在 3 个随机行上测试此代码，输出是这样的，有很多 1 和 0：

text	emotion 1	you're welcome	[0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0] 5	Prepping for auditions this afternoon. From wh...	[0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, ... 7	off to face my exam now	[0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, ...

图片链接：drive.google.com/file/d/1Q8uB5vZyjEWEVfyIrRircY5S4jbVb781/… 这是我已经完成的步骤：drive.google.com/file/d/1Osh1rKnjLlZ60yHzwAMQgA7XS1YJ9EUJ/… 这是模型的问题，它有 14 个输出，所以肯定不是二元分类模型。这个答案是为了运行预测，它工作正常。请在模型上提出一个新问题。我已发帖：***.com/questions/67356712/… 谢谢，我也去看看。请接受这个答案。谢谢

以上是关于如何通过遍历行来预测数据框中的每一行？的主要内容，如果未能解决你的问题，请参考以下文章