使用短语中的信息查找句子中单词的索引
Posted
技术标签:
【中文标题】使用短语中的信息查找句子中单词的索引【英文标题】:find index of word in sentence with information from phrase 【发布时间】:2021-12-18 14:16:44 【问题描述】:我需要sentence
中的word
的索引。但有时会出现重复的单词。 phrase
信息会很有帮助。或word
列中的上一行或下一行。
基本上,我只需要识别话语中的单词,例如如果word
是“海边”,我想知道它在句子中是哪个“海边”。我有来自phrase
的额外信息可以帮助识别。它们在数据框中的出现顺序也有帮助。
我现在拥有的是这样的:
file_id | phrase | word | sentence | word_indices |
---|---|---|---|---|
A | I am | I | I am a happy bird. I sing every day. I eat worms. | [0, 5, 9] |
B | the seaside is | the | she is by the seaside. The seaside is packed. | [3, 5] |
B | the seaside is | seaside | she is by the seaside. The seaside is packed. | [4, 6] |
B | the seaside is | is | she is by the seaside. The seaside is packed. | [1, 7] |
C | nobody knows | nobody | nobody knows what is going on. She can find nobody | [0, 9] |
C | find nobody | nobody | nobody knows what is going on. She can find nobody | [0, 9] |
D | it is such a sunny day | sunny | it is such a sunny day ah I am so happy when it's sunny such a sunny day is the best | [4, 13, 16] |
但我想得到的是target
列中的内容。
file_id | phrase | word | sentence | word_indices | target |
---|---|---|---|---|---|
A | I am | I | I am a happy bird. I sing every day. I eat worms. | [0, 5, 9] | [0] |
B | the seaside is | the | she is by the seaside. The seaside is packed. | [3, 5] | [5] |
B | the seaside is | seaside | she is by the seaside. The seaside is packed. | [4, 6] | [6] |
B | the seaside is | is | she is by the seaside. The seaside is packed. | [1, 7] | [7] |
C | nobody knows | nobody | nobody knows what is going on. She can find nobody | [0, 9] | [0] |
C | find nobody | nobody | nobody knows what is going on. She can find nobody | [0, 9] | [9] |
D | it is such a sunny day | sunny | it is such a sunny day ah I am so happy when it's sunny such a sunny day is the best | [4, 13, 16] | [4] |
我在这里发现了一个类似的问题:Find index of words in matched text 但不幸的是,这是在 java 中,我需要使用 python 来回答。
非常感谢!
【问题讨论】:
你能给出更准确的定义吗?我假设,如果word
在句子中不是唯一的,算法将查找phrase
术语并返回该短语第一次出现的单词的索引,对吗?如果phrase
出现多次会怎样?如果word
多次出现但phrase
没有出现怎么办?
感谢您的评论。是的,你问的问题也是我的问题。基本上,我只需要识别话语中的单词,例如如果word
是“海边”,我想知道它在句子中是哪个“海边”。我有来自phrase
的额外信息可以帮助识别。它们在数据框中出现的顺序也有帮助。
【参考方案1】:
我会将其分为两个步骤。找出导致句子中短语的单词数,然后找到短语中单词的单词索引号:如下所示:
def get_index_of_word_in_sentence(word, phrase, sentence):
index1 = sentence.index(phrase)
word_num1 = len(sentence[:index1].split())
word_num2 = phrase.split().index(word)
return word_num1 + word_num2
df["target"] = df.apply(lambda x: get_index_of_word_in_sentence(x["word"], x["phrase"], x["sentence"]), axis=1)
【讨论】:
感谢您的回答!看来这是在计算字符串的索引?我得到以下结果:[0, 23, 27, 35, 0, 44, 13]。这似乎不是我需要的...... 啊啊是的,我误读了这个问题。我已经修改了我的答案。 成功了!谢谢! @SuperDuperMario,你能告诉我你是怎么得到这个词的索引的吗? @DanielWyatt 如果我没有它们,我该如何修改你的函数来获取单词索引(基本上,我需要填写 word_indices 列)?谢谢!!以上是关于使用短语中的信息查找句子中单词的索引的主要内容,如果未能解决你的问题,请参考以下文章