ElasticSearch:文档的基本CRUD与批量操作

Posted czbxdd

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ElasticSearch:文档的基本CRUD与批量操作相关的知识,希望对你有一定的参考价值。

ElasticSearch(二):文档的基本CRUD与批量操作

学习课程链接《Elasticsearch核心技术与实战》


Create 文档

支持自动生成文档_id和指定文档_id两种方式。

  • 通过调用POST index_name/_doc,系统会自动生成文档 _id。
#create document. 自动生成 _id
POST users/_doc
{
    "user" : "Mike",
    "post_date" : "2019-04-15T14:12:12",
    "message" : "trying out Kibana"
}
#返回结果
{
  "_index" : "users",
  "_type" : "_doc",
  "_id" : "TyPHr20BkakgvNgYZu2L",#自动生成文档的_id
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}
  • 使用PUT index_name/_create/_idPUT index_name/_doc/_id?op_type=create创建时,URI中显示指定_create,此时如果该_id的文档已经存在,操作会失败。
#1.create document. 指定 _id 如果已经存在,就报错
PUT users/_create/1
{
    "user" : "Jack",
    "post_date" : "2019-05-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
#2.create document. 指定_id。如果_id已经存在,报错
PUT users/_doc/1?op_type=create
{
    "user" : "Jack",
    "post_date" : "2019-05-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
#如果_id已经存在报错信息,如下:
{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[1]: version conflict, document already exists (current version [1])",
        "index_uuid": "ohLNyzUmTv6cm-Ih9kH0bw",
        "shard": "0",
        "index": "users"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[1]: version conflict, document already exists (current version [1])",
    "index_uuid": "ohLNyzUmTv6cm-Ih9kH0bw",
    "shard": "0",
    "index": "users"
  },
  "status": 409
}


Index 文档

Index和Create不一样的地方:如果文档不存在,就索引新的文档。否则现有的文档会被删除,新的文档被索引,版本信息+1。使用PUT index_name/_doc/_id

PUT users/_doc/1
{
    "user" : "Mike"
}
#返回结果
{
  "_index" : "users",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3, #版本增加
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 4,
  "_primary_term" : 2
}


Update 文档

Update方法不会删除原来的文档,而是实现真正的数据更新,更新的文档必须存在,更新的内容需要包含在doc中。
#更新文档API POST index_name/_update/_id { "doc":{ "field1":"value1", "field2":"value2" } }

 #更新_id=1文档
POST users/_update/1
{
    "doc":{
        "post_date" : "2019-05-15T14:12:12",
         "message" : "trying out Elasticsearch"
    }  
}


Get 文档

根据文档ID,获取相应文档信息,GET index_name/_doc/_id

#Get the document by ID
GET users/_doc/1
#返回结果
{
  "_index" : "users",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "user" : "Jack",
    "post_date" : "2019-05-15T14:12:12",
    "message" : "trying out Elasticsearch"
  }
}


Delete 文档

根据文档ID,删除相应文档信息,DELETE index_name/_doc/_id

# 删除文档
DELETE users/_doc/1


批量操作-bulk

批量操作,可以减少网络连接所产生的开销,提高性能。

  • 支持在一次API调用中,对不同的索引进行操作。
  • 支持四种类型操作:Index,Create,Update,Delete
  • 可以在URI中指定Index,也可以在请求中指定。
  • 操作中单条操作失败,并不会影响其他操作。
  • 返回结果包括了每一条操作执行的结果。
  • 不要发送过多数据,一般建议是1000-5000个文档,如果你的文档很大,可以适当减少队列,大小建议是5-15MB,默认不能超过100M,会报错。
### Bulk 操作
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test2", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
#返回结果
{
  "took" : 227,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "delete" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_version" : 1,
        "result" : "not_found",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 404
      }
    },
    {
      "create" : {
        "_index" : "test2",
        "_type" : "_doc",
        "_id" : "3",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "update" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 2,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}


批量读取-mget

mget 是通过文档_id列表得到文档信息。

### mget 操作
GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_id" : "1"
        },
        {
            "_index" : "test",
            "_id" : "2"
        }
    ]
}

#URI中指定index
GET /test/_mget
{
    "docs" : [
        {
            "_id" : "1"
        },
        {
            "_id" : "2"
        }
    ]
}

GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_id" : "1",
            "_source" : false
        },
        {
            "_index" : "test",
            "_id" : "2",
            "_source" : ["field3", "field4"]
        },
        {
            "_index" : "test",
            "_id" : "3",
            "_source" : {
                "include": ["user"],
                "exclude": ["user.location"]
            }
        }
    ]
}
#返回结果
{
  "docs" : [
    {
      "_index" : "test",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 4,
      "_seq_no" : 5,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "field1" : "value1",
        "field2" : "value2"
      }
    },
    {
      "_index" : "test",
      "_type" : "_doc",
      "_id" : "2",
      "found" : false
    }
  ]
}


批量查询-msearch

msearch 是根据查询条件,搜索到相应文档。

POST kibana_sample_data_ecommerce/_msearch
{}
{"query" : {"match_all" : {}},"size":1}
{"index" : "kibana_sample_data_flights"}
{"query" : {"match_all" : {}},"size":2}


常见错误返回说明

问题 原因
无法连接 网络故障或集群挂了
连接无法关闭 网络故障或节点出错
429 集群过于繁忙
4xx 请求体格式有错
500 集群内部错误

以上是关于ElasticSearch:文档的基本CRUD与批量操作的主要内容,如果未能解决你的问题,请参考以下文章

第三百六十二节,Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)基本的索引和文档CRUD操作

Elasticsearch 7 探索之路文档的 CRUD 和批量操作

ElasticSearch核心概念和文档的CRUD

四十一 Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)基本的索引和文档CRUD操作增删改查

Elasticsearch学习笔记3:关于索引文档的CRUD操作(kibana)

Elasticsearch学习笔记3:关于索引文档的CRUD操作(kibana)