2.elasticsearch文档批量操作-bulk api

Posted 2022-11-19 PacosonSWJTU

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了2.elasticsearch文档批量操作-bulk api相关的知识，希望对你有一定的参考价值。

【README】

1.本文介绍了elasticsearch文档批量操作的api， bulk；

2.bulk api：使得在单个api调用请求中可以执行多个 index/delete（索引或删除）操作，这可以极大提高索引速度；

3.bulk api可以参考 Bulk API | Elasticsearch Guide [7.2] | Elastic

【1】bulk api介绍

1）语法格式

语法格式：
action:metadata\\n
请求体 \\n

2）bulk api列表：

create 如果文档不存在就保存，但如果文档存在就返回错误；
*index 如果文档不存在就保存，如果文档存在就更新 (upsert)（更新时不比较新老数据）；
update 更新一个文档，如果文档不存在就返回错误（比较新老数据，若相同，则不更新返回noop）；
delete 删除一个文档，如果要删除的文档id不存在，就返回错误；

3）利用bulk api保存文档时，显然使用 bulk index是比较推荐的方式（因为bulk index是 upsert，有则更新，否则新增）；

【1.1】bulk create 批量保存文档

1）create 如果文档不存在就保存，但如果文档存在就返回错误；

Post localhost:9200/_bulk
"create":"_index":"website","_type":"blog","_id":"3" 
"title":"zhangsan03_bulk", "body":"成都欢迎你03" 
"create":"_index":"website","_type":"blog","_id":"4" 
"title":"zhangsan04_bulk", "body":"成都欢迎你04" 
// 这里必须有一个空行，否则报错

2）再重试执行一次；报错如下（文档已经存在）：


    "took": 1,
    "errors": true,
    "items": [
        
            "create":  
                "_index": "website",
                "_type": "blog",
                "_id": "3",
                "status": 409,
                "error": 
                    "type": "version_conflict_engine_exception",
                    "reason": "[3]: version conflict, document already exists (current version [1])", // 文档已经存在报错
                    "index_uuid": "rAlhUmExQvCXb1pGZJ1tog",
                    "shard": "0",
                    "index": "website"
                
            
        ,
        
    ]

【1.2】bulk-delete 批量删除文档

1）delete 删除一个文档，如果要删除的文档id不存在，就返回错误

Post  localhost:9200/_bulk 
"delete":"_index":"website","_type":"blog","_id":"3"
"delete":"_index":"website","_type":"blog","_id":"4"
// 这里必须有一个空行

【1.3】bulk index 批量保存或更新文档（不比较新老数据）

1）bulk index ：如果文档不存在就保存，如果文档存在就更新 (upsert)；

Post  localhost:9200/_bulk 
"index":"_index":"website","_type":"blog", "_id":"25"
"title":"zhangsan25_bulk", "body":"成都欢迎你25" 
"index":"_index":"website","_type":"blog", "_id":"26"
"title":"zhangsan26_bulk", "body":"成都欢迎你26"
// 这里必须有一个空行


    "took": 235,
    "errors": false,
    "items": [
        
            "index": 
                "_index": "website",
                "_type": "blog",
                "_id": "25",
                "_version": 1,
                "result": "created", // 创建事件
                "_shards": 
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                ,
                "_seq_no": 18,
                "_primary_term": 1,
                "status": 201
            
......

2）再执行一次，则是更新事件；


    "took": 157,
    "errors": false,
    "items": [
        
            "index": 
                "_index": "website",
                "_type": "blog",
                "_id": "25",
                "_version": 2,
                "result": "updated", //更新事件 
......

【1.4】bulk update 批量更新文档（比较新老数据）

1）update 更新一个文档，如果文档不存在就返回错误；

Post localhost:9200/_bulk
"update":"_index":"website","_type":"blog","_id":"25" 
"doc":"title":"zhangsan25_bulk_update01" 
"update":"_index":"website","_type":"blog","_id":"26" 
"doc":"title":"zhangsan26_bulk_update02" 
// 这里必须要有一个换行

2）多次更新25 26号文档，报文体相同，返回结果是 noop；

bulk update：会比较新老数据，若两者相同，则不更新，返回 noop；


    "took": 2,
    "errors": false,
    "items": [
        
            "update": 
                "_index": "website",
                "_type": "blog",
                "_id": "25",
                "_version": 3,
                "result": "noop", // 没有操作
                "_shards": 
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                ,
                "status": 200
            
        ,
        ...... 
    ]

3）若更新一个不存在的文档（id=35，id=36的文档就不存在），报 document is missing 错误：

Post localhost:9200/_bulk 
"update":"_index":"website","_type":"blog","_id":"35" 
"doc":"title":"zhangsan25_bulk_update01" 
"update":"_index":"website","_type":"blog","_id":"36" 
"doc":"title":"zhangsan26_bulk_update02"  



    "took": 0,
    "errors": true,
    "items": [
        
            "update": 
                "_index": "website",
                "_type": "blog",
                "_id": "35",
                "status": 404,
                "error": 
                    "type": "document_missing_exception", // 文档不存在错误 
                    "reason": "[blog][35]: document missing",
                    "index_uuid": "rAlhUmExQvCXb1pGZJ1tog",
                    "shard": "0",
                    "index": "website"
                
            
        ,
        ...... 
    ]

【2】bulk 批量导入样本数据

Post  localhost:9200/bank/account/_bulk
样本数据

样本数据转自： https://github.com/linuxacademy/content-elasticsearch-deep-dive/blob/master/sample_data/accounts.json

以上是关于2.elasticsearch文档批量操作-bulk api的主要内容，如果未能解决你的问题，请参考以下文章