2.elasticsearch文档批量操作-bulk api
Posted PacosonSWJTU
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了2.elasticsearch文档批量操作-bulk api相关的知识,希望对你有一定的参考价值。
【README】
1.本文介绍了elasticsearch文档批量操作的api, bulk;
2.bulk api:使得在单个api调用请求中可以执行多个 index/delete(索引或删除) 操作,这可以极大提高索引速度;
3.bulk api可以参考 Bulk API | Elasticsearch Guide [7.2] | Elastic
【1】bulk api介绍
1)语法格式
语法格式:
action:metadata\\n
请求体 \\n
2)bulk api列表 :
- create 如果文档不存在就保存,但如果文档存在就返回错误;
- *index 如果文档不存在就保存,如果文档存在就更新 (upsert)(更新时不比较新老数据);
- update 更新一个文档,如果文档不存在就返回错误(比较新老数据,若相同,则不更新返回noop);
- delete 删除一个文档,如果要删除的文档id不存在,就返回错误;
3)利用bulk api保存文档时,显然使用 bulk index是比较推荐的方式(因为bulk index是 upsert,有则更新,否则新增);
【1.1】bulk create 批量保存文档
1)create 如果文档不存在就保存,但如果文档存在就返回错误;
Post localhost:9200/_bulk
"create":"_index":"website","_type":"blog","_id":"3"
"title":"zhangsan03_bulk", "body":"成都欢迎你03"
"create":"_index":"website","_type":"blog","_id":"4"
"title":"zhangsan04_bulk", "body":"成都欢迎你04"
// 这里必须有一个空行,否则报错
2)再重试执行一次;报错如下(文档已经存在):
"took": 1,
"errors": true,
"items": [
"create":
"_index": "website",
"_type": "blog",
"_id": "3",
"status": 409,
"error":
"type": "version_conflict_engine_exception",
"reason": "[3]: version conflict, document already exists (current version [1])", // 文档已经存在报错
"index_uuid": "rAlhUmExQvCXb1pGZJ1tog",
"shard": "0",
"index": "website"
,
]
【1.2】bulk-delete 批量删除文档
1)delete 删除一个文档,如果要删除的文档id不存在,就返回错误
Post localhost:9200/_bulk
"delete":"_index":"website","_type":"blog","_id":"3"
"delete":"_index":"website","_type":"blog","_id":"4"
// 这里必须有一个空行
【1.3】bulk index 批量保存或更新文档(不比较新老数据)
1)bulk index :如果文档不存在就保存,如果文档存在就更新 (upsert);
Post localhost:9200/_bulk
"index":"_index":"website","_type":"blog", "_id":"25"
"title":"zhangsan25_bulk", "body":"成都欢迎你25"
"index":"_index":"website","_type":"blog", "_id":"26"
"title":"zhangsan26_bulk", "body":"成都欢迎你26"
// 这里必须有一个空行
"took": 235,
"errors": false,
"items": [
"index":
"_index": "website",
"_type": "blog",
"_id": "25",
"_version": 1,
"result": "created", // 创建事件
"_shards":
"total": 2,
"successful": 1,
"failed": 0
,
"_seq_no": 18,
"_primary_term": 1,
"status": 201
......
2)再执行一次,则是更新事件;
"took": 157,
"errors": false,
"items": [
"index":
"_index": "website",
"_type": "blog",
"_id": "25",
"_version": 2,
"result": "updated", //更新事件
......
【1.4】bulk update 批量更新文档(比较新老数据 )
1)update 更新一个文档,如果文档不存在就返回错误;
Post localhost:9200/_bulk
"update":"_index":"website","_type":"blog","_id":"25"
"doc":"title":"zhangsan25_bulk_update01"
"update":"_index":"website","_type":"blog","_id":"26"
"doc":"title":"zhangsan26_bulk_update02"
// 这里必须要有一个换行
2)多次更新25 26号文档,报文体相同,返回结果是 noop;
- bulk update: 会比较新老数据,若两者相同,则不更新,返回 noop;
"took": 2,
"errors": false,
"items": [
"update":
"_index": "website",
"_type": "blog",
"_id": "25",
"_version": 3,
"result": "noop", // 没有操作
"_shards":
"total": 2,
"successful": 1,
"failed": 0
,
"status": 200
,
......
]
3)若更新一个不存在的文档(id=35,id=36的文档就不存在),报 document is missing 错误:
Post localhost:9200/_bulk
"update":"_index":"website","_type":"blog","_id":"35"
"doc":"title":"zhangsan25_bulk_update01"
"update":"_index":"website","_type":"blog","_id":"36"
"doc":"title":"zhangsan26_bulk_update02"
"took": 0,
"errors": true,
"items": [
"update":
"_index": "website",
"_type": "blog",
"_id": "35",
"status": 404,
"error":
"type": "document_missing_exception", // 文档不存在错误
"reason": "[blog][35]: document missing",
"index_uuid": "rAlhUmExQvCXb1pGZJ1tog",
"shard": "0",
"index": "website"
,
......
]
【2】bulk 批量导入样本数据
Post localhost:9200/bank/account/_bulk
样本数据
样本数据转自: https://github.com/linuxacademy/content-elasticsearch-deep-dive/blob/master/sample_data/accounts.json
以上是关于2.elasticsearch文档批量操作-bulk api的主要内容,如果未能解决你的问题,请参考以下文章
ElasticSearch01ElasticSearch入门