ES Document API之多文档API

Posted 2020-10-22 猪朵朵

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了ES Document API之多文档API相关的知识，希望对你有一定的参考价值。

多文档API

多获取API Get API

#获取一个类型的多个文档，有多种API写法，如下：
#1
curl -XGET ‘localhost:9200/_mget?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "type",
            "_id" : "1"
        },
        {
            "_index" : "test",
            "_type" : "type",
            "_id" : "2"
        }
    ]
}
‘

#2
curl -XGET ‘localhost:9200/test/_mget?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "docs" : [
        {
            "_type" : "type",
            "_id" : "1"
        },
        {
            "_type" : "type",
            "_id" : "2"
        }
    ]
}
‘
#3
curl -XGET ‘localhost:9200/test/type/_mget?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "docs" : [
        {
            "_id" : "1"
        },
        {
            "_id" : "2"
        }
    ]
}
‘
#4
curl -XGET ‘localhost:9200/test/type/_mget?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "ids" : ["1", "2"]
}
‘

#_type字段可选，如果没有指定type，则默认第一个类型中满足条件的id文档作为返回结果，下面的例子中，返回两个相同的结果
curl -XGET ‘localhost:9200/test/_mget?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "ids" : ["1", "1"]
}
‘

#可以明确指定不同的类型
curl -XGET ‘localhost:9200/test/_mget/?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
  "docs" : [
        {
            "_type":"typeA",
            "_id" : "1"
        },
        {
            "_type":"typeB",
            "_id" : "1"
        }
    ]
}
‘

#过滤_source
curl -XGET ‘localhost:9200/_mget?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "type",
            "_id" : "1",
            "_source" : false
        },
        {
            "_index" : "test",
            "_type" : "type",
            "_id" : "2",
            "_source" : ["field3", "field4"]
        },
        {
            "_index" : "test",
            "_type" : "type",
            "_id" : "3",
            "_source" : {
                "include": ["user"],
                "exclude": ["user.location"]
            }
        }
    ]
}
‘

#指定stored_field字段返回 1 返回 field1，field2,2返回field3，field4
curl -XGET ‘localhost:9200/test/type/_mget?stored_fields=field1,field2&pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "docs" : [
        {
            "_id" : "1" 
        },
        {
            "_id" : "2",
            "stored_fields" : ["field3", "field4"] 
        }
    ]
}
‘

#指定routing
curl -XGET ‘localhost:9200/_mget?routing=key1&pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "type",
            "_id" : "1",
            "routing" : "key2"
        },
        {
            "_index" : "test",
            "_type" : "type",
            "_id" : "2"
        }
    ]
}
‘

批量API Bulk API 提高索引效率

批量操作在写法上需要注意的事项：换行\n结尾；回车\r ；开始 Content-Type头部应该被设置为application/x-ndjson；

支持的参数：version_type;routing;wait_for_active_shards;refresh;

update操作支持的动作参数：retry_on_conflict;doc;doc_as_upsert;script;lang;source.

#批量操作数据结构

action_and_meta_data \ n
optional_source \ n
action_and_meta_data \ n
optional_source \ n
....
action_and_meta_data \ n
optional_source \ n


curl -XPOST ‘localhost:9200/_bulk?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
‘

curl -XPOST ‘localhost:9200/_bulk?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "retry_on_conflict" : 3} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_type" : "type1", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_type" : "type1", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}
‘

Term vectors 返回特定文档信息中的术语信息和统计信息

#示例
curl -XGET ‘localhost:9200/twitter/tweet/1/_termvectors?pretty‘

#可以指定要检索的字段

curl -XGET ‘localhost:9200/twitter/tweet/1/_termvectors?fields=message&pretty‘

返回字段包含：term infomation,term statistics,field statistics。默认返回：term infomation和field statistics。

term infomation:1.字段的term频率；2.term 位置；3.开始和结束偏移量；4.term 有效负载即使没保存也可以计算。

term statistics :1.术语在总文档中出现的频次 2.包含当前术语的总文档数

field statistics ： 1.包含该字段的文档数doc_count 2.该字段中所有term的文档总频次 sum_doc_freq 3.该term中每个term的总频次 sum_ttf

multi term vectors 一次性取回多个termvectors

curl -XPOST ‘localhost:9200/_mtermvectors?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
   "docs": [
      {
         "_index": "twitter",
         "_type": "tweet",
         "_id": "2",
         "term_statistics": true
      },
      {
         "_index": "twitter",
         "_type": "tweet",
         "_id": "1",
         "fields": [
            "message"
         ]
      }
   ]
}
‘

Search API

查询API通常是多文档多类型的，除explain api

查询API可指定多个routing，用“，”隔开

自适应副本选择：代替循环查询副本的方式，将请求发送到被认为是最佳的分片上：

Response time of past requests between the coordinating node and the node containing the copy of the data
Time past search requests took to execute on the node containing the data
The queue size of the search threadpool on the node containing the data

统计信息组：search 与信息统计组关联，每个统计组包含其统计聚合

全局搜索超时：search.default_search_timeout 在集群更新中设置，设置为-1时，即不超时

搜索任务取消：task cancell中取消搜索任务

#指定routing
curl -XPOST ‘localhost:9200/twitter/tweet/_search?routing=kimchy&pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "query": {
        "bool" : {
            "must" : {
                "query_string" : {
                    "query" : "some query string here"
                }
            },
            "filter" : {
                "term" : { "user" : "kimchy" }
            }
        }
    }
}
‘

#动态集群设置，自适应分片选择参数为true，默认为false
curl -XPUT ‘localhost:9200/_cluster/settings?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "transient": {
        "cluster.routing.use_adaptive_replica_selection": true
    }
}
‘

#与统计信息组做关联
curl -XPOST ‘localhost:9200/_search?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "query" : {
        "match_all" : {}
    },
    "stats" : ["group1", "group2"]
}
‘

以上是关于ES Document API之多文档API的主要内容，如果未能解决你的问题，请参考以下文章