使用 spaCy 3 进行自定义 NER 训练会引发 ValueError
Posted
技术标签:
【中文标题】使用 spaCy 3 进行自定义 NER 训练会引发 ValueError【英文标题】:Custom NERs training with spaCy 3 throws ValueError 【发布时间】:2021-05-24 10:28:34 【问题描述】:我正在尝试使用 spacy 3 添加自定义 NER 标签。我找到了旧版本的教程并对 spacy 3 进行了调整。这是我正在使用的整个代码:
import random
import spacy
from spacy.training import Example
LABEL = 'ANIMAL'
TRAIN_DATA = [
("Horses are too tall and they pretend to care about your feelings", 'entities': [(0, 6, LABEL)]),
("Do they bite?", 'entities': []),
("horses are too tall and they pretend to care about your feelings", 'entities': [(0, 6, LABEL)]),
("horses pretend to care about your feelings", 'entities': [(0, 6, LABEL)]),
("they pretend to care about your feelings, those horses", 'entities': [(48, 54, LABEL)]),
("horses?", 'entities': [(0, 6, LABEL)])
]
nlp = spacy.load('en_core_web_sm') # load existing spaCy model
ner = nlp.get_pipe('ner')
ner.add_label(LABEL)
print(ner.move_names) # Here I see, that the new label was added
optimizer = nlp.create_optimizer()
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes): # only train NER
for itn in range(20):
random.shuffle(TRAIN_DATA)
losses =
for text, annotations in TRAIN_DATA:
doc = nlp(text)
example = Example.from_dict(doc, annotations)
nlp.update([example], drop=0.35, sgd=optimizer, losses=losses)
print(losses)
# test the trained model # add some dummy sentences with many NERs
test_text = 'Do you like horses?'
doc = nlp(test_text)
print("Entities in '%s'" % test_text)
for ent in doc.ents:
print(ent.label_, " -- ", ent.text)
此代码输出 ValueError 异常,但仅在 2 次迭代后 - 注意前 2 行:
'ner': 9.862242701536594
'ner': 8.169456698315201
Traceback (most recent call last):
File ".\custom_ner_training.py", line 46, in <module>
nlp.update([example], drop=0.35, sgd=optimizer, losses=losses)
File "C:\ogr\moje\python\spacy_pg\myvenv\lib\site-packages\spacy\language.py", line 1106, in update
proc.update(examples, sgd=None, losses=losses, **component_cfg[name])
File "spacy\pipeline\transition_parser.pyx", line 366, in spacy.pipeline.transition_parser.Parser.update
File "spacy\pipeline\transition_parser.pyx", line 478, in spacy.pipeline.transition_parser.Parser.get_batch_loss
File "spacy\pipeline\_parser_internals\ner.pyx", line 310, in spacy.pipeline._parser_internals.ner.BiluoPushDown.set_costs
ValueError
我看到ANIMAL
标签是通过调用ner.move_names
添加的。
当我更改我的值 LABEL = 'PERSON
时,代码成功运行并将马识别为新数据上的 PERSON
。这就是我假设代码本身没有错误的原因。
我有什么遗漏吗?我究竟做错了什么?请问有人可以复制吗?
注意:这是我在这里的第一个问题。我希望我提供了所有信息。如果没有,请在 cmets 中告诉我。
【问题讨论】:
【参考方案1】:您需要在for
循环中更改以下行
doc = nlp(text)
到
doc = nlp.make_doc(text)
代码应该可以工作并产生以下结果:
'ner': 9.60289144264557
'ner': 8.875474230820478
'ner': 6.370401408220459
'ner': 6.687456469517201
...
'ner': 1.3796682589133492e-05
'ner': 1.7709562613218738e-05
Entities in 'Do you like horses?'
ANIMAL -- horses
【讨论】:
使用 spacy 3.0.3 进行此更改对我不起作用 我使用的是确切版本:Name: spacy - Version: 3.0.3
我必须将 spacy-lookups-data 添加到我的要求中。你的解决方案现在对我有用。以上是关于使用 spaCy 3 进行自定义 NER 训练会引发 ValueError的主要内容,如果未能解决你的问题,请参考以下文章