IOS/Objective-C:用于识别命名实体的 NSLinguisticTagger

Posted

技术标签:

【中文标题】IOS/Objective-C:用于识别命名实体的 NSLinguisticTagger【英文标题】:IOS/Objective-C: NSLinguisticTagger for Recognizing Named Entities 【发布时间】:2019-02-16 07:59:40 【问题描述】:

Apple 提供了一个用于在 Swift 中使用标记器来识别命名实体的方法,但不适用于 Objective-C。

这是他们提供的 Swift 示例:

let text = "The American Red Cross was established in Washington, D.C., by Clara Barton."
let tagger = NSLinguisticTagger(tagSchemes: [.nameType], options: 0)
tagger.string = text
let range = NSRange(location:0, length: text.utf16.count)
let options: NSLinguisticTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NSLinguisticTag] = [.personalName, .placeName, .organizationName]
tagger.enumerateTags(in: range, unit: .word, scheme: .nameType, options: options)  tag, tokenRange, stop in
    if let tag = tag, tags.contains(tag) 
        let name = (text as NSString).substring(with: tokenRange)
        print("\(name): \(tag)")
    

我已经用help from here 翻译它,但我不知道如何指定标签,例如[.personalName, .placeName, .organizationName]:这只是您列举的标签类型数组吗?

NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc]
                              initWithTagSchemes:[NSArray arrayWithObjects:NSLinguisticTagSchemeNameType, nil]
                              options:(NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation | NSLinguisticTaggerJoinNames)];
[tagger setString:text];

[tagger enumerateTagsInRange:NSMakeRange(0, [text length])
                      scheme:NSLinguisticTagSchemeNameType
                     options:(NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation| NSLinguisticTaggerJoinNames)
                  usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) 
                      NSString *token = [text substringWithRange:tokenRange];
                      NSString *name =[tagger tagAtIndex:tokenRange.location scheme:NSLinguisticTagSchemeNameType tokenRange:NULL sentenceRange:NULL];

                      if (name == nil) 
                          name = token;
                      

                      NSLog(@"tagger results:%@, %@", token, name);
                  ];

感谢任何关于如何在 Objective-C 中指定标签的建议。

【问题讨论】:

NSArray *tags = [NSLinguisticTagPersonalName, ...]... if (tag && [tags contains:tag]) NSString *name = [text substringWithRange:tokenRange]; NSLog("%@: %@", name, tag);? 好吧,我猜标签只是一个用来比较的数组,而不是你可以放入标签器的过滤器。 我不知道那个 API,但那似乎是 Swift 代码的作用。 【参考方案1】:

原始 Swift 代码:

let text = "The American Red Cross was established in Washington, D.C., by Clara Barton."
let tagger = NSLinguisticTagger(tagSchemes: [.nameType], options: 0)
tagger.string = text
let range = NSRange(location:0, length: text.utf16.count)
let options: NSLinguisticTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NSLinguisticTag] = [.personalName, .placeName, .organizationName]
tagger.enumerateTags(in: range, unit: .word, scheme: .nameType, options: options)  tag, tokenRange, stop in
    if let tag = tag, tags.contains(tag) 
        let name = (text as NSString).substring(with: tokenRange)
        print("\(name): \(tag)")
    

输出:

American Red Cross: NSLinguisticTag(_rawValue: OrganizationName)
Washington: NSLinguisticTag(_rawValue: PlaceName)
Clara Barton: NSLinguisticTag(_rawValue: PersonalName)

Objective-C 版本:

NSString* text = @"The American Red Cross was established in Washington, D.C., by Clara Barton.";
NSLinguisticTagger* tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:@[NSLinguisticTagSchemeNameType] options:0];
tagger.string = text;
NSRange range = NSMakeRange(0, text.length);
NSLinguisticTaggerOptions options = NSLinguisticTaggerOmitPunctuation | NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerJoinNames;
NSArray* tags = @[NSLinguisticTagPersonalName, NSLinguisticTagPlaceName, NSLinguisticTagOrganizationName];
[tagger enumerateTagsInRange:range unit:NSLinguisticTaggerUnitWord scheme:NSLinguisticTagSchemeNameType options:options usingBlock:^(NSLinguisticTag  _Nullable tag, NSRange tokenRange, BOOL * _Nonnull stop) 
    if ([tags containsObject:tag]) 
        NSString* name = [text substringWithRange:tokenRange];
        NSLog(@"%@: %@", name, tag);
    
];

输出:

2018-09-12 09:51:00.323378-0700 App[2408:109005] American Red Cross: OrganizationName
2018-09-12 09:51:00.323755-0700 App[2408:109005] Washington: PlaceName
2018-09-12 09:51:00.323901-0700 App[2408:109005] Clara Barton: PersonalName

【讨论】:

以上是关于IOS/Objective-C:用于识别命名实体的 NSLinguisticTagger的主要内容,如果未能解决你的问题,请参考以下文章

跟我读论文丨ACL2021 NER BERT化隐马尔可夫模型用于多源弱监督命名实体识别

论文解读丨图神经网络应用于半结构化文档的命名实体识别和关系提取

阿里巴巴狗尾草苏大联合论文:基于对抗学习的众包标注用于中文命名实体识别

自然语言处理基于sklearn-crfsuite进行命名实体识别

NLP(6)——命名实体识别

如何改进hanlp命名实体识别