Elasticsearch6.x indices apis（索引api）

Posted 2021-02-15 exact

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Elasticsearch6.x indices apis（索引api）相关的知识，希望对你有一定的参考价值。

1.索引api

indices apis 用于管理索引划分，索引设置，索引别名，字段映射，索引模板

index management

1.1 create index

创建索引，可以指定设置和字段映射，也可以不指定，甚至可以省略创建索引过程，es会自动创建，示例：

curl -X PUT "localhost:9200/test" -H ‘Content-Type: application/json‘ -d‘
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "type1" : {
            "properties" : {
                "field1" : { "type" : "text" }
            }
        }
    }
}
‘

1.2 delete index

指定索引名，或者通配符来删除索引，不能使用别名，使用*或_all会删除所有索引。示例：

curl -X DELETE "localhost:9200/twitter"

1.3 get index

获取索引的信息，示例：

curl -X GET "localhost:9200/twitter"

1.4 indices exists

检查索引或者别名是否存在，示例：

curl -X HEAD "localhost:9200/twitter"

1.5 open/close index api

索引可以关闭打开，关闭之后该索引不可用（不能读写，会阻塞），需要使用时可以再把索引打开，示例：

curl -X POST "localhost:9200/my_index/_close"
curl -X POST "localhost:9200/my_index/_open"

关闭索引占用大量磁盘空间，可能导致环境出问题，将cluster.indices.close.enable 置为false可以关闭索引的close功能

1.6 shrink index（收缩索引）

收缩索引api允许将现有索引缩小为具有较少主分片的新索引。

目标索引中请求的主分片数必须是源索引中主分片数的一个约数，如8可以缩小为4，2，1（这是为了保持一致性hash），收缩的工作原理如下：
　　首先创建一个和源索引定义一致但分片数较少的目标索引
　　然后将源索引中的段硬链接到目标索引（如果文件系统不支持硬链接，则复制）
　　最后，恢复目标索引，就像一个刚重新打开的封闭索引
　　示例：

curl -X POST "localhost:9200/my_source_index/_shrink/my_target_index" -H ‘Content-Type: application/json‘ -d‘
{
  "settings": {
    "index.number_of_replicas": 1,
    "index.number_of_shards": 1, 
    "index.codec": "best_compression" 
  },
  "aliases": {
    "my_search_indices": {}
  }
}
‘

1.7 split index（拆分索引）

split index api允许将一个已经存在的索引拆分为新索引，其中每个原始主分片在新索引中会拆分为两个或多个主分片。
可以拆分索引的次数由index.number_of_routing_shards设置决定。路由分片的数量指定内部使用的散列空间。
　　number_of_shards 当前索引的分片数
　　number_of_routing_shards 拆分时可以拆分的分片数
how does splitting work
　　首先，创建和目标索引设置一致的新索引（除了新索引的分片数比源索引大）
　　然后将源索引中的段硬链接到目标索引
　　底层文件拷贝完毕后，所有文档将再次hashed，删除不属于本分片的文档
　　最后，恢复源索引
why doesn‘t es support increment resharding?
　　hash一致性问题
　　示例：
　　拆分前需要将索引设置为只读

curl -X PUT "localhost:9200/my_source_index/_settings" -H ‘Content-Type: application/json‘ -d‘
{
  "settings": {
    "index.blocks.write": true 
  }
}
‘

开始分片，指定新索引的设置

curl -X PUT "localhost:9200/my_source_index" -H ‘Content-Type: application/json‘ -d‘
{
    "settings": {
        "index.number_of_shards" : 1,
        "index.number_of_routing_shards" : 2 
    }
}
‘

1.8 rellover index（滚动索引）

rollover index api允许在现有索引太大或者太旧时，将索引别名转移到新索引
示例：

curl -X PUT "localhost:9200/logs-000001" -H ‘Content-Type: application/json‘ -d‘
{
  "aliases": {
    "logs_write": {}
  }
}
‘

# Add > 1000 documents to logs-000001，然后执行下面的语句会生成一个新的索引

curl -X POST "localhost:9200/logs_write/_rollover" -H ‘Content-Type: application/json‘ -d‘
{
  "conditions": {
    "max_age":   "7d",
    "max_docs":  1000,
    "max_size":  "5gb"
  }
}
‘

mapping managemenet

1.9 put mapping

put mapping api允许将字段添加到现有索引或更改现有字段的映射，示例：

curl -X PUT "localhost:9200/twitter/_mapping/_doc" -H ‘Content-Type: application/json‘ -d‘
{
  "properties": {
    "email": {
      "type": "keyword"
    }
  }
}
‘

1.10 get mapping

获取索引中定义的映射关系（映射关系对比于数据库中的表结构，每个索引存在一个映射关系），示例：

curl -X GET "localhost:9200/twitter/_mapping/_doc"

1.11 get field mapping

获取索引中某个具体字段的映射，示例（获取字段title的映射关系）：

curl -X GET "localhost:9200/publications/_mapping/_doc/field/title"

1.12 types exists

判断某个type是否存在，示例：

curl -X HEAD "localhost:9200/twitter/_mapping/tweet"

alias management 索引别名管理

1.13 index aliases

给索引赋予一个别名，示例：

curl -X PUT "localhost:9200/logs_201305/_alias/2013"
curl -X POST "localhost:9200/_aliases" -H ‘Content-Type: application/json‘ -d‘
{
    "actions" : [
        { "add" : { "index" : "test1", "alias" : "alias1" } }
    ]
}
‘
curl -X POST "localhost:9200/_aliases" -H ‘Content-Type: application/json‘ -d‘
{
    "actions" : [
        { "remove" : { "index" : "test1", "alias" : "alias1" } }
    ]
}
‘

index settings（索引设置，用于控制索引操作的一些行为）

1.14 update indices settings

实时修改索引级别设置，rest形式为： endpoint(端点) /_settings(更新所有索引), {index}/_settings(更新一个或多个),请求正文包含更新的设置，示例：

curl -X PUT "localhost:9200/twitter/_settings" -H ‘Content-Type: application/json‘ -d‘
{
    "index" : {
        "number_of_replicas" : 2
    }
}
‘

1.15 get settings

获取索引的设置项（即索引中行为有关的定义，如数据库某个库的可配置参数：连接数，大小），示例：

curl -X GET "localhost:9200/log_2013_*/_settings"

1.16 analyze

对文本执行分析过程并返回分析结果（返回给定字符串的分词结果），示例：

curl -X GET "localhost:9200/_analyze" -H ‘Content-Type: application/json‘ -d‘
{
  "analyzer" : "standard",   　　//指定使用的分析器
  "text" : "this is a test"      //要分析的字段类型和内容
}
‘

获取分词器拆分结果的更详细信息，示例：

curl -X GET "localhost:9200/_analyze" -H ‘Content-Type: application/json‘ -d‘
{
  "tokenizer" : "standard", 　　//分词器
  "filter" : ["snowball"],      //过滤器，对分词结果进行进一步过滤
  "text" : "detailed output",
  "explain" : true,
  "attributes" : ["keyword"] 
}
‘

1.17 index templates

指定创建索引时自动应用的模板，主要包括setting（索引的属性）和mapping（索引的字段映射等设置）。
匹配模式是索引模板中设置"index_patterns": ["te*", "bar*"]，这样所有以te，bar开头的索引都会应用这个模板
示例：

curl -X PUT "localhost:9200/_template/template_1" -H ‘Content-Type: application/json‘ -d‘
{
  "index_patterns": ["te*", "bar*"],
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "type1": {
      "_source": {
        "enabled": false
      },
      "properties": {
        "host_name": {
          "type": "keyword"
        }
      }
    }
  }
}
‘

删除索引模板：

　　curl -X DELETE "localhost:9200/_template/template_1"
获取索引模板：
　　curl -X GET "localhost:9200/_template/template_1" 获取指定索引模板，支持*
　　curl -X GET "localhost:9200/_template" 获取所有索引模板

monitoring

1.18 indices stats

索引级别统计信息提供有关索引上发生的不同操作的统计信息，api提供有关索引范围的统计信息（尽管大多数统计信息也可以使用节点级别范围获取）

1.19 indices segments

获取当前节点的各个分片包含的段信息，包括段名（文件名），段中文档数量，大小，删除数量
示例：

curl -X GET "localhost:9200/shakespeare/_segments?pretty"

1.20 indices recovery

检查索引是否正在恢复，恢复到什么位置了
示例：

curl -X GET "localhost:9200/shakespeare/_recovery?pretty"

1.21 indices shard stores

获取分片的信息，可以指定获取不同状态的分片，示例：

curl -X GET "localhost:9200/_shard_stores?status=green"

status management（状态管理）

1.22 clear cache（清空缓存）

curl -X POST "localhost:9200/twitter/_cache/clear"

1.23 refresh（刷新数据）

刷新某个索引的修改、创建，使得查询可见
示例：

curl -X POST "localhost:9200/twitter/_refresh"

1.24 flush（）

将数据刷新到索引存储（磁盘）并清除内存事物日志（transactionlog log）
示例：

curl -X POST "localhost:9200/shakespeare/_flush?pretty"

请求参数：
wait_if_ongoing 如果另一个刷新在执行，是否等待，默认false，分片级抛异常
force 强制立即刷新
synced flush
　　es追踪每个分片的索引活动。5分钟内没有收到索引请求的分片自动被标记为inactive。
　　synced flush为每个分片生成unique marker（sync_id）。
　　由于处于非活动状态的索引被sync id标记，因此可以用作检测两个分片的lucene索引是否相同的快速方法。
　　可以通过手动的方式进行sync flush调用而不需要等待5min，示例：

curl -X POST "localhost:9200/twitter/_flush/synced"

flush和refresh的区别

数据写入es完整过程：

　　外部文档数据 -> index-buffer -> 文件系统缓存(同时sync translog) -> 磁盘。

refresh
　　索引文档的同时可以进行搜索。

　　实时搜索是基于内存的（index-buffer）。

　　es索引提交不会把数据直接写到磁盘，而是将index-buffer中文档解析完成的segment写到filesystem cache中（避免磁盘损耗，而且已经可以提供搜索功能）。

　　此时translog依然记录着这些索引请求，直到flush刷新到磁盘。
　　refresh即为index-buffer到文件缓存系统的过程，目的是所有已有数据可搜索（将数据交给lucene进行索引，但是可能还没有commit，因为lucene的commit比较耗时）
flush

　　从文件缓存系统到磁盘的过程，目的是数据不丢失（flush操作实际就是触发lucene进行commit，然后将事物日志translog落地到磁盘）

　　es的每个shard会每30分钟执行一次flush操作
　　当translog的数据达到某个上限的时候会进行一次flush操作

1.25 force merge

强制合并分片中的段(段是lucene存储文档的最小单位，不可修改，每次refresh都会生成一个新的段，包含最新的数据，后续会自动合并)，示例：

curl -X POST "localhost:9200/twitter/_forcemerge"

以上是关于Elasticsearch6.x indices apis（索引api）的主要内容，如果未能解决你的问题，请参考以下文章

Ubuntu 18.04 LTS 安装 Elasticsearch6.x

Elasticsearch6.x使用初探

elasticsearch6.2.4 安装X-pack之后使用head插件

Elasticsearch6.X 去重详解

Elasticsearch6.x之Kibana插件安装

《ElasticSearch6.x实战教程》之分词