Elasticsearch 分词器的使用与IK分词器安装

Posted 2021-09-05 斯普润布特

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Elasticsearch 分词器的使用与IK分词器安装相关的知识，希望对你有一定的参考价值。

Elasticsearch 分词器安装与使用

ES内置分词器

standard：默认分词器，简单会被拆分，英文统一转换为小写
simlle：按照非字母分词，英文统一转换为小写
whitespace：按照空格分词
stop：去除无意义的单词，比如：the、is、a、an
keyword：不做分词，把文本的整体当作一个单独的关键词

测试分词结果

指定分词器测试结果

GET    http://192.168.213.154:9200/_analyze
POST   http://192.168.213.154:9200/_analyze

{
    "text":"The Super Man",
    "analyzer":"standard"
}

测试文本中的字段分词结果

GET    http://192.168.213.154:9200/index_test_1/_analyze
POST   http://192.168.213.154:9200/index_test_1/_analyze

{
    "analyzer": "standard",
    "field": "name",
    "text": "Why so powerful "
}

索引中使用分词器

可以在创建索引的时候同时创建映射关系的时候使用
也可以在为索引创建映射关系的时候使用

PUT     http://192.168.213.154:9200/index_test_1
{
    "mappings": {
        "properties": {
            "realname": {
            	"type": "text",
            	"index": true
            },
            "username": {
            	"type": "keyword",
            	"index": false
            }
        }
    }
}


---------------------------------------------------------------------


POST    http://192.168.213.154:9200/index_test_1/mapping
{
    "properties": {
        "name": {
        	   "type": "text",
               "analyzer":"stop"
        }
    }
}

安装中文IK分词器

github IK分词器下载地址，选择对应ES版本的分词器即可
进入elasticsearch安装目录
在./plugins文件加下创建一个ik文件夹
- cd /usr/local/elasticsearch/plugins/
- mkdir ik
然后将下载的压缩包解压到这个文件夹下，重启ES即可
- cd ik
- unzip /home/chenyb/software/elasticsearch-analysis-ik-7.4.2.zip
- 记得给ik文件夹授权
  - chown es:es ./ik/

IK分词器

ik：ik_max_word，细粒度分词

{
    "tokens": [
        {
            "token": "中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "中华人民",
            "start_offset": 0,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "中华",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "华人",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "人民共和国",
            "start_offset": 2,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "人民",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "共和国",
            "start_offset": 4,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 6
        },
        {
            "token": "共和",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "国",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 8
        }
    ]
}

ik：ik_smart，粗力度分词

{
    "tokens": [
        {
            "token": "中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        }
    ]
}

自定义词库

编辑配置文件
- 绝对路径，相对路径都可以，多个配置用“;”间隔
- 命名没有要求，建议dic后缀规范而已
  - vim ./elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
- 重启ES服务生效

以上是关于Elasticsearch 分词器的使用与IK分词器安装的主要内容，如果未能解决你的问题，请参考以下文章