实战Elasticsearch6的join类型
Posted 程序员欣宸
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了实战Elasticsearch6的join类型相关的知识,希望对你有一定的参考价值。
欢迎访问我的GitHub
本篇概览
- 《Elasticsearch实战》(英文名Elasticsearch IN ACTION)是经典es教程,对应demo源码地址为:https://github.com/dakrone/elasticsearch-in-action ,最新分支6.x,在使用源码时,发现索引_doc的静态映射脚本增加了一个类型为join的字段,如下所示,:
"mappings" :
"_doc" :
"_source" :
"enabled" : true
,
"properties" :
"relationship_type":
"type": "join",
"relations" :
"group": "event"
,
...
- 这是es6新增的类型,一起来通过实战学习这个join;
环境信息
- 操作系统:Ubuntu 18.04.2 LTS
- elasticsearch:6.7.1
- kibana:6.7.1
《Elasticsearch实战》demo源码下载地址
- 本文用到的源码一共两个文件,一个是创建静态映射的mapping.json, 另一个是创建文档的populate.sh , 地址分别如下:
- https://github.com/dakrone/elasticsearch-in-action/blob/6.x/mapping.json
- https://github.com/dakrone/elasticsearch-in-action/blob/6.x/populate.sh
- 上述文件的用法:下载到同一个目录,执行命令**./populate.sh 192.168.1.101:9200** ,"192.168.1.101:9200"是es6的http地址和端口;
官方说法
-
官方对join类型的说明如下:
-
我的理解:
- join类型用于建立索引内文档的父子关系;
- 用父子文档的名字来表示关系;
- 接下来看看《Elasticsearch实战》的demo中是怎么使用这个字段的;
《Elasticsearch实战》的demo
- demo中部分文档的创建脚本如下所示:
curl -s -XPOST "$ADDRESS/get-together/_doc/1" -HContent-Type: application/json -d
"relationship_type": "group",
"name": "Denver Clojure",
"organizer": ["Daniel", "Lee"],
"description": "Group of Clojure enthusiasts from Denver who want to hack on code together and learn more about Clojure",
"created_on": "2012-06-15",
"tags": ["clojure", "denver", "functional programming", "jvm", "java"],
"members": ["Lee", "Daniel", "Mike"],
"location_group": "Denver, Colorado, USA"
curl -s -XPOST "$ADDRESS/get-together/_doc/100?routing=1" -HContent-Type: application/json -d
"relationship_type":
"name": "event",
"parent": "1"
,
"host": ["Lee", "Troy"],
"title": "Liberator and Immutant",
"description": "We will discuss two different frameworks in Clojure for doing different things. Liberator is a ring-compatible web framework based on Erlang Webmachine. Immutant is an all-in-one enterprise application based on JBoss.",
"attendees": ["Lee", "Troy", "Daniel", "Tom"],
"date": "2013-09-05T18:00",
"location_event":
"name": "Stoneys Full Steam Tavern",
"geolocation": "39.752337,-105.00083"
,
"reviews": 4
-
如上所示,id为1的记录,其relationship_type字段的值为"group",id为2的记录,relationship_type字段的值不是字符串,而是对象,parent为1表示父文档id为1,name为"event"表示父子关系是"group:event"类型;
-
注意:上述第二个文档的地址中携带了routing参数,以保持父子在同一个分片,这是在使用join类型是要格外注意的地方;
-
接下来,确保前面提到的populate.sh脚本已经执行,使得_doc索引及其文档数据在es环境中准备好,就可以实战了,实战环境是Kibana的Det Tools:
查找所有父类型为"group"的文档(结果是子文档):
- 执行如下脚本:
GET get-together/_search
"query":
"has_parent":
"parent_type": "group",
"query":
"match_all":
- 可以得到所有父类型为"group"的子文档:
"took" : 1,
"timed_out" : false,
"_shards" :
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
,
"hits" :
"total" : 15,
"max_score" : 1.0,
"hits" : [
"_index" : "get-together",
"_type" : "_doc",
"_id" : "106",
"_score" : 1.0,
"_routing" : "3",
"_source" :
"relationship_type" :
"name" : "event",
"parent" : "3"
,
"host" : "Mik",
"title" : "Social management and monitoring tools",
"description" : "Shay Banon will be there to answer questions and we can talk about management tools.",
"attendees" : [
"Shay",
"Mik",
"John",
"Chris"
],
"date" : "2013-03-06T18:00",
"location_event" :
"name" : "Quid Inc",
"geolocation" : "37.798442,-122.399801"
,
"reviews" : 5
,
"_index" : "get-together",
"_type" : "_doc",
"_id" : "107",
"_score" : 1.0,
"_routing" : "3",
"_source" :
"relationship_type" :
"name" : "event",
"parent" : "3"
,
"host" : "Mik",
"title" : "Logging and Elasticsearch",
"description" : "Get a deep dive for what Elasticsearch is and how it can be used for logging with Logstash as well as Kibana!",
"attendees" : [
"Shay",
"Rashid",
"Erik",
"Grant",
"Mik"
],
"date" : "2013-04-08T18:00",
"location_event" :
"name" : "Salesforce headquarters",
"geolocation" : "37.793592,-122.397033"
,
"reviews" : 3
,
...
查找所有子类型为"event"的文档(结果是父文档)
- 执行如下脚本:
GET get-together/_search
"query":
"has_child":
"type": "event",
"query":
"match_all":
- 可以得到所有子类型为"event"的文档:
"took" : 1,
"timed_out" : false,
"_shards" :
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
,
"hits" :
"total" : 5,
"max_score" : 1.0,
"hits" : [
"_index" : "get-together",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" :
"relationship_type" : "group",
"name" : "Elasticsearch San Francisco",
"organizer" : "Mik",
"description" : "Elasticsearch group for ES users of all knowledge levels",
"created_on" : "2012-08-07",
"tags" : [
"elasticsearch",
"big data",
"lucene",
"open source"
],
"members" : [
"Lee",
"Igor"
],
"location_group" : "San Francisco, California, USA"
,
"_index" : "get-together",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" :
"relationship_type" : "group",
"name" : "Denver Clojure",
"organizer" : [
"Daniel",
"Lee"
],
"description" : "Group of Clojure enthusiasts from Denver who want to hack on code together and learn more about Clojure",
"created_on" : "2012-06-15",
"tags" : [
"clojure",
"denver",
"functional programming",
"jvm",
"java"
],
"members" : [
"Lee",
"Daniel",
"Mike"
],
"location_group" : "Denver, Colorado, USA"
,
...
查找parent的id等于1的子文档
- 执行如下脚本:
GET get-together/_search
"query":
"parent_id":
"type": "event",
"id": "1"
- 可以得到所有parent的id等于1的子文档:
"took" : 0,
"timed_out" : false,
"_shards" :
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
,
"hits" :
"total" : 3,
"max_score" : 1.3291359,
"hits" : [
"_index" : "get-together",
"_type" : "_doc",
"_id" : "100",
"_score" : 1.3291359,
"_routing" : "1",
"_source" :
"relationship_type" :
"name" : "event",
"parent" : "1"
,
"host" : [
"Lee",
"Troy"
],
"title" : "Liberator and Immutant",
"description" : "We will discuss two different frameworks in Clojure for doing different things. Liberator is a ring-compatible web framework based on Erlang Webmachine. Immutant is an all-in-one enterprise application based on JBoss.",
"attendees" : [
"Lee",
"Troy",
"Daniel",
"Tom"
],
"date" : "2013-09-05T18:00",
"location_event" :
"name" : "Stoneys Full Steam Tavern",
"geolocation" : "39.752337,-105.00083"
,
"reviews" : 4
,
...
用script_fields简化返回内容
- 前面的查询,返回的内容是整个_source,如果不需要全部内容,可以用script_fields来简化;
- 查找所有父文档ID等1的的子文档,并且返回内容只有三个字段:父文档ID、子文档ID、子文档title字段:
GET get-together/_search
"query":
"parent_id":
"type": "event",
"id": "1"
,
"script_fields":
"group_id":
"script":
"source":"doc[relationship_type#group]"
,"event_id":
"script":
"source":"doc[_id]"
,
"title":
"script":"params[_source][title]"
- 得到结果如下:
"took" : 1,
"timed_out" : false,
"_shards" :
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
,
"hits" :
"total" : 3,
"max_score" : 1.3291359,
"hits" : [
"_index" : "get-together",
"_type" : "_doc",
"_id" : "100",
"_score" : 1.3291359,
"_routing" : "1",
"fields" :
"event_id" : [
"100"
],
"title" : [
"Liberator and Immutant"
],
"group_id" : [
"1"
]
,
"_index" : "get-together",
"_type" : "_doc",
"_id" : "101",
"_score" : 1.3291359,
"_routing" : "1",
"fields" :
"event_id" : [
"101"
],
"title" : [
"Sunday, Surly Sunday"
],
"group_id" : [
"1"
]
,
"_index" : "get-together",
"_type" : "_doc",
"_id" : "102",
"_score" : 1.3291359,
"_routing" : "1",
"fields" :
"event_id" : [
"102"
],
"title" : [
"10 Clojure coding techniques you should know, and project openbike"
],
"group_id" : [
"1"
]
]
聚合
- 执行以下查询,会将所有父文档为group的子文档做桶聚合聚合:
GET get-together/_search
"query":
"has_parent":
"parent_type": "group",
"query":
"match_all":
,
"aggs":
"parents":
"terms":
"field":"relationship_type#group"
- 得到的结果如下,按照父文档ID得到聚合结果:
"aggregations" :
"parents" :
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
"key" : "1",
"doc_count" : 3
,
"key" : "2",
"doc_count" : 3
,
"key" : "3",
"doc_count" : 3
,
"key" : "4",
"doc_count" : 3
,
"key" : "5",
"doc_count" : 3
]
- 以上就是join类型的主要实战内容了,希望能帮助您理解这个新的类型;
欢迎关注51CTO博客:程序员欣宸
以上是关于实战Elasticsearch6的join类型的主要内容,如果未能解决你的问题,请参考以下文章
《ElasticSearch6.x实战教程》之简单搜索Java客户端(上)