elasticsearch5.6.8中文分词器

Posted jiqing9006

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了elasticsearch5.6.8中文分词器相关的知识,希望对你有一定的参考价值。

安装分词器,务必确保版本一致!

下载地址:https://github.com/medcl/elasticsearch-analysis-ik

技术分享图片

技术分享图片

为了保证一致,我特地将elasticsearch进行降级。

ik_smart

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text": "中华人民共和国国歌"
}
{
  "tokens": [
    {
      "token": "中华人民共和国",
      "start_offset": 0,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "国歌",
      "start_offset": 7,
      "end_offset": 9,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

ik_max_word

GET _analyze?pretty
{
  "analyzer": "ik_max_word",
  "text": "中华人民共和国国歌"
}
{
  "tokens": [
    {
      "token": "中华人民共和国",
      "start_offset": 0,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "中华人民",
      "start_offset": 0,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 1
    },
    {
      "token": "中华",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "华人",
      "start_offset": 1,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "人民共和国",
      "start_offset": 2,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 4
    },
    {
      "token": "人民",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 5
    },
    {
      "token": "共和国",
      "start_offset": 4,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 6
    },
    {
      "token": "共和",
      "start_offset": 4,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 7
    },
    {
      "token": "国",
      "start_offset": 6,
      "end_offset": 7,
      "type": "CN_CHAR",
      "position": 8
    },
    {
      "token": "国歌",
      "start_offset": 7,
      "end_offset": 9,
      "type": "CN_WORD",
      "position": 9
    }
  ]
}

以上是关于elasticsearch5.6.8中文分词器的主要内容,如果未能解决你的问题,请参考以下文章

搭建Elasticsearch5.6.8 分布式集群

ik分词器热更新原理

11个Java开源中文分词器使用方法和分词效果对比

11大Java开源中文分词器的使用方法和分词效果对比

solr7.0.0+IKAnalyzer中文分词器V2012+tomcat9

ElasticSearch 中文分词器对比