使用pyltp提取文本中的地址
Posted bincoding
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用pyltp提取文本中的地址相关的知识,希望对你有一定的参考价值。
首先安装pyltp
pytlp项目首页单例类(第一次调用时加载模型)
class Singleton(object):
def __new__(cls, *args, **kwargs):
if not hasattr(cls, ‘_the_instance‘):
cls._the_instance = object.__new__(cls, *args, **kwargs)
return cls._the_instance
- 使用pyltp提取地址
import os
from pyltp import Segmentor, Postagger, NamedEntityRecognizer
from main.models.Singleton import Singleton
class address_extract_model(Singleton):
print(‘load ltp model start...‘)
pwd = os.getcwd()
project_path = os.path.abspath(os.path.dirname(pwd) + os.path.sep + ".")
LTP_DATA_DIR = project_path + ‘AlarmClassificationmainltpmodel‘ # ltp模型目录的路径
cws_model_path = os.path.join(LTP_DATA_DIR, ‘cws.model‘)
pos_model_path = os.path.join(LTP_DATA_DIR, ‘pos.model‘) # 词性标注模型路径,模型名称为`pos.model`
ner_model_path = os.path.join(LTP_DATA_DIR, ‘ner.model‘) # 命名实体识别模型路径,模型名称为`ner.model`
print(‘path‘ + cws_model_path)
segmentor = Segmentor() # 初始化实例
segmentor.load(cws_model_path) # 加载模型
postagger = Postagger() # 初始化实例
postagger.load(pos_model_path) # 加载模型
recognizer = NamedEntityRecognizer() # 初始化实例
recognizer.load(ner_model_path) # 加载模型
def get_model(self):
return self.segmentor, self.postagger, self.recognizer
def get_address_prediction(alarm_content):
model = address_extract_model()
segmentor, postagger, recognizer = model.get_model()
words = segmentor.segment(alarm_content) # 分词
postags = postagger.postag(words) # 词性标注
netags = recognizer.recognize(words, postags) # 命名实体识别
result = ‘‘
for i in range(0, len(netags)):
print(words[i] + ‘: ‘ + netags[i])
# 地名标签为 ns
if ‘s‘ in netags[i]:
result += words[i] + ‘,‘
if len(result) < 1:
result = ‘No address!‘
print(result)
return result
def get_address(alarm_content):
print("start get_address...")
result = "Exception"
try:
result = get_address_prediction(alarm_content)
except Exception as ex:
print(ex)
print("Output is " + result)
return result
# segmentor.release() # 释放模型
# postagger.release()
# recognizer.release()
- 运行效果
- 项目源码 ( 命名实体提取代码位于main/ltp, 模型文件需要到pyltp下载 )
https://github.com/haibincoder/AlarmClassification
以上是关于使用pyltp提取文本中的地址的主要内容,如果未能解决你的问题,请参考以下文章