ElasticSearch2.2.1之IK分词器的安装
Posted 洽洽老大
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ElasticSearch2.2.1之IK分词器的安装相关的知识,希望对你有一定的参考价值。
安装
- 首先到github ik上下载版本为1.8.1的源码,可以直接下载zip文件,也可以通过git下载。
- 解压文件
elasticsearch-analyze-ik-1.8.1.zip
,在下载目录执行unzip elasticsearch-analyze-ik-1.8.1.zip -d ik
- 进到ik目录下
cd ik
- 用maven进行编译打包,需要装好maven,执行
mvn package
- 打包完后在target/release目录下,出现
elasticsearch-analysis-ik-1.8.1.zip
- 将该压缩文件解压并复制到Elasticsearch每个节点的
ES_HOME/plugins/lk
目录下 - 重启每个节点
注: 如果安装其他版本,请查看https://github.com/medcl/elasticsearch-analysis-ik,在分支那里选择对应的版本下载。
测试
创建索引
curl -XPUT http://localhost:9200/index
配置映射
curl -XPOST http://host:9200/iktest/fulltext/_mapping -d'
"fulltext":
"_all":
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"term_vector": "no",
"store": "false"
,
"properties":
"content":
"type": "string",
"store": "no",
"term_vector": "with_positions_offsets",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"include_in_all": "true",
"boost": 8
'
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;
ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
索引文档
curl -XPOST http://host:9200/iktest/fulltext/1 -d'
"content":"美国留给伊拉克的是个烂摊子吗"
'
curl -XPOST http://host:9200/iktest/fulltext/2 -d'
"content":"公安部:各地校车将享最高路权"
'
curl -XPOST http://host:9200/iktest/fulltext/3 -d'
"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"
curl -XPOST http://host:9200/iktest/fulltext/4 -d'
"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
'
查询
curl -XPOST http://localhost:9200/iktest/fulltext/_search -d'
"query" : "term" : "content" : "中国" ,
"highlight" :
"pre_tags" : ["<tag1>", "<tag2>"],
"post_tags" : ["</tag1>", "</tag2>"],
"fields" :
"content" :
'
结果为
"took": 6,
"timed_out": false,
"_shards":
"total": 5,
"successful": 5,
"failed": 0
,
"hits":
"total": 2,
"max_score": 1.5,
"hits": [
"_index": "iktest",
"_type": "fulltext",
"_id": "4",
"_score": 1.5,
"_source":
"content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
,
"highlight":
"content": [
"<tag1>中国</tag1>驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
]
,
"_index": "iktest",
"_type": "fulltext",
"_id": "3",
"_score": 0.53699243,
"_source":
"content": "中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"
,
"highlight":
"content": [
"中韩渔警冲突调查:韩警平均每天扣1艘<tag1>中国</tag1>渔船"
]
]
分词结果查看
curl 'http://host:9200/index/_analyze?analyzer=ik&pretty=true' -d '
"text": "别说话,我想静静"
'
结果
"tokens": [
"token": "别说",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
,
"token": "说话",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 1
,
"token": "我",
"start_offset": 4,
"end_offset": 5,
"type": "CN_CHAR",
"position": 2
,
"token": "想",
"start_offset": 5,
"end_offset": 6,
"type": "CN_CHAR",
"position": 3
,
"token": "静静",
"start_offset": 6,
"end_offset": 8,
"type": "CN_WORD",
"position": 4
,
"token": "静",
"start_offset": 6,
"end_offset": 7,
"type": "CN_WORD",
"position": 5
,
"token": "静",
"start_offset": 7,
"end_offset": 8,
"type": "CN_WORD",
"position": 6
]
以上是关于ElasticSearch2.2.1之IK分词器的安装的主要内容,如果未能解决你的问题,请参考以下文章