数据集中的正则表达式单词替换显示结果没有变化

Posted 2023-03-28

技术标签:

【中文标题】数据集中的正则表达式单词替换显示结果没有变化【英文标题】：Regex word replacement in a dataset showing no change in the result 【发布时间】：2022-01-11 09:41:09 【问题描述】：

我有一个具有这种结构的数据集数组

print(type(test_small_testval))
print((test_small_testval.features))

<class 'datasets.arrow_dataset.Dataset'>
'premise': Value(dtype='string', id=None), 'hypothesis': Value(dtype='string', id=None), 'label': ClassLabel(num_classes=3, names=['entailment', 'neutral', 'contradiction'], names_file=None, id=None)

我可以通过这样做访问数据集的假设列

for i in range(len(test_small_testval)):
    print(test_small_testval['hypothesis'][i])

例如，前两个元素可以看作：

print(test_small_testval['hypothesis'][0:2])
['The owner threw the toy', 'The dog walked across the fallen log.']

在本专栏“假设”中，我想遍历每个字符串并像这样替换：

test_small_testval['hypothesis'][i] = re.sub(r'\bshe\b', r'them', test_small_testval['hypothesis'][i])
            test_small_testval['hypothesis'][i] = re.sub(r'\bhe\b', r'them', test_small_testval['hypothesis'][i])
            test_small_testval['hypothesis'][i] = re.sub(r'\bher\b', r'them', test_small_testval['hypothesis'][i])
            test_small_testval['hypothesis'][i] = re.sub(r'\bhim\b', r'them', test_small_testval['hypothesis'][i])
            test_small_testval['hypothesis'][i] = re.sub(r'\bdog\b', r'animal', test_small_testval['hypothesis'][i])
            test_small_testval['hypothesis'][i] = re.sub(r'\bcat\b', r'animal', test_small_testval['hypothesis'][i])

它似乎没有替换该特定列中的单词。我以为我正在用原始字符串覆盖字符串，但这种方法对我来说似乎很好。

任何指针？

【问题讨论】：

【参考方案1】：

我将所有内容都转换为 pandas 数据框，并且同样的方法效果很好

【讨论】：

以上是关于数据集中的正则表达式单词替换显示结果没有变化的主要内容，如果未能解决你的问题，请参考以下文章