ElasticSearch：文档的基本CRUD与批量操作

Posted 2021-05-07 czbxdd

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了ElasticSearch：文档的基本CRUD与批量操作相关的知识，希望对你有一定的参考价值。

ElasticSearch（二）：文档的基本CRUD与批量操作

Create 文档

支持自动生成文档_id和指定文档_id两种方式。

通过调用POST index_name/_doc，系统会自动生成文档 _id。

#create document. 自动生成 _id
POST users/_doc
{
    "user" : "Mike",
    "post_date" : "2019-04-15T14:12:12",
    "message" : "trying out Kibana"
}

#返回结果
{
  "_index" : "users",
  "_type" : "_doc",
  "_id" : "TyPHr20BkakgvNgYZu2L",#自动生成文档的_id
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

使用PUT index_name/_create/_id或PUT index_name/_doc/_id?op_type=create创建时，URI中显示指定_create，此时如果该_id的文档已经存在，操作会失败。

#1.create document. 指定 _id 如果已经存在，就报错
PUT users/_create/1
{
    "user" : "Jack",
    "post_date" : "2019-05-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
#2.create document. 指定_id。如果_id已经存在，报错
PUT users/_doc/1?op_type=create
{
    "user" : "Jack",
    "post_date" : "2019-05-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

#如果_id已经存在报错信息，如下：
{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[1]: version conflict, document already exists (current version [1])",
        "index_uuid": "ohLNyzUmTv6cm-Ih9kH0bw",
        "shard": "0",
        "index": "users"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[1]: version conflict, document already exists (current version [1])",
    "index_uuid": "ohLNyzUmTv6cm-Ih9kH0bw",
    "shard": "0",
    "index": "users"
  },
  "status": 409
}

Index 文档

Index和Create不一样的地方：如果文档不存在，就索引新的文档。否则现有的文档会被删除，新的文档被索引，版本信息+1。使用PUT index_name/_doc/_id。

PUT users/_doc/1
{
    "user" : "Mike"
}

#返回结果
{
  "_index" : "users",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3, #版本增加
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 4,
  "_primary_term" : 2
}

Update 文档

Update方法不会删除原来的文档，而是实现真正的数据更新，更新的文档必须存在，更新的内容需要包含在doc中。
#更新文档API POST index_name/_update/_id { "doc":{ "field1":"value1", "field2":"value2" } }

 #更新_id=1文档
POST users/_update/1
{
    "doc":{
        "post_date" : "2019-05-15T14:12:12",
         "message" : "trying out Elasticsearch"
    }  
}

Get 文档

根据文档ID，获取相应文档信息，GET index_name/_doc/_id

#Get the document by ID
GET users/_doc/1

#返回结果
{
  "_index" : "users",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "user" : "Jack",
    "post_date" : "2019-05-15T14:12:12",
    "message" : "trying out Elasticsearch"
  }
}

Delete 文档

根据文档ID，删除相应文档信息，DELETE index_name/_doc/_id

# 删除文档
DELETE users/_doc/1

批量操作-bulk

批量操作，可以减少网络连接所产生的开销，提高性能。

支持在一次API调用中，对不同的索引进行操作。
支持四种类型操作：Index,Create,Update,Delete。
可以在URI中指定Index，也可以在请求中指定。
操作中单条操作失败，并不会影响其他操作。
返回结果包括了每一条操作执行的结果。
不要发送过多数据，一般建议是1000-5000个文档，如果你的文档很大，可以适当减少队列，大小建议是5-15MB，默认不能超过100M，会报错。

### Bulk 操作
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test2", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

#返回结果
{
  "took" : 227,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "delete" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_version" : 1,
        "result" : "not_found",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 404
      }
    },
    {
      "create" : {
        "_index" : "test2",
        "_type" : "_doc",
        "_id" : "3",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "update" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 2,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}

批量读取-mget

mget 是通过文档_id列表得到文档信息。

### mget 操作
GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_id" : "1"
        },
        {
            "_index" : "test",
            "_id" : "2"
        }
    ]
}

#URI中指定index
GET /test/_mget
{
    "docs" : [
        {
            "_id" : "1"
        },
        {
            "_id" : "2"
        }
    ]
}

GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_id" : "1",
            "_source" : false
        },
        {
            "_index" : "test",
            "_id" : "2",
            "_source" : ["field3", "field4"]
        },
        {
            "_index" : "test",
            "_id" : "3",
            "_source" : {
                "include": ["user"],
                "exclude": ["user.location"]
            }
        }
    ]
}

#返回结果
{
  "docs" : [
    {
      "_index" : "test",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 4,
      "_seq_no" : 5,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "field1" : "value1",
        "field2" : "value2"
      }
    },
    {
      "_index" : "test",
      "_type" : "_doc",
      "_id" : "2",
      "found" : false
    }
  ]
}

批量查询-msearch

msearch 是根据查询条件，搜索到相应文档。

POST kibana_sample_data_ecommerce/_msearch
{}
{"query" : {"match_all" : {}},"size":1}
{"index" : "kibana_sample_data_flights"}
{"query" : {"match_all" : {}},"size":2}

常见错误返回说明

问题	原因
无法连接	网络故障或集群挂了
连接无法关闭	网络故障或节点出错
429	集群过于繁忙
4xx	请求体格式有错
500	集群内部错误

以上是关于ElasticSearch：文档的基本CRUD与批量操作的主要内容，如果未能解决你的问题，请参考以下文章

第三百六十二节，Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)基本的索引和文档CRUD操作

Elasticsearch 7 探索之路文档的 CRUD 和批量操作

ElasticSearch核心概念和文档的CRUD

四十一 Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)基本的索引和文档CRUD操作增删改查

Elasticsearch学习笔记3：关于索引文档的CRUD操作（kibana）