Elasticsearch 6.X 新类型Join深入详解
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Elasticsearch 6.X 新类型Join深入详解相关的知识,希望对你有一定的参考价值。
0、ES6.X 一对多、多对多的数据该如何存储和实现呢?
引出问题:
“某头条新闻APP”新闻内容和新闻评论是1对多的关系?
在ES6.X该如何存储、如何进行高效检索、聚合操作呢?
相信阅读本文,你就能得到答案!
1、ES6.X 新类型Join 产生背景
-
mysql中多表关联,我们可以通过left join 或者Join等实现;
-
ES5.X版本,借助父子文档实现多表关联,类似数据库中Join的功能;实现的核心是借助于ES5.X支持1个索引(index)下多个类型(type)。
-
ES6.X版本,由于每个索引下面只支持单一的类型(type)。
- 所以,ES6.X版本如何实现Join成为大家关注的问题。
幸好,ES6.X新推出了Join类型,主要解决类似Mysql中多表关联的问题。
2、ES6.X Join类型介绍
仍然是一个索引下,借助父子关系,实现类似Mysql中多表关联的操作。
3、ES6.X Join类型实战
3.1 ES6.X Join类型 Mapping定义
Join类型的Mapping如下:
核心
-
1) "my_join_field"为join的名称。
- 2)"question": "answer" 指:qustion为answer的父类。
1PUT my_join_index
2{
3 "mappings": {
4 "_doc": {
5 "properties": {
6 "my_join_field": {
7 "type": "join",
8 "relations": {
9 "question": "answer"
10 }
11 }
12 }
13 }
14 }
15}
3.2 ES6.X join类型定义父文档
直接上以下简化的形式,更好理解些。
如下,定义了两篇父文档。
文档类型为父类型:"question"。
1PUT my_join_index/_doc/1?refresh
2{
3 "text": "This is a question",
4 "my_join_field": "question"
5}
6PUT my_join_index/_doc/2?refresh
7{
8 "text": "This is another question",
9 "my_join_field": "question"
10}
3.3 ES6.X join类型定义子文档
路由值是强制性的,因为父文件和子文件必须在相同的分片上建立索引。
"answer"是此子文档的加入名称。
- 指定此子文档的父文档ID:1。
1PUT my_join_index/_doc/3?routing=1&refresh
2{
3 "text": "This is an answer",
4 "my_join_field": {
5 "name": "answer",
6 "parent": "1"
7 }
8}
9PUT my_join_index/_doc/4?routing=1&refresh
10{
11 "text": "This is another answer",
12 "my_join_field": {
13 "name": "answer",
14 "parent": "1"
15 }
16}
4、ES6.X Join类型约束
- 每个索引只允许一个Join类型Mapping定义;
- 父文档和子文档必须在同一个分片上编入索引;这意味着,当进行删除、更新、查找子文档时候需要提供相同的路由值。
- 一个文档可以有多个子文档,但只能有一个父文档。
- 可以为已经存在的Join类型添加新的关系。
- 当一个文档已经成为父文档后,可以为该文档添加子文档。
5、ES6.X Join类型检索与聚合
5.1 ES6.X Join全量检索
1GET my_join_index/_search
2{
3 "query": {
4 "match_all": {}
5 },
6 "sort": ["_id"]
7}
返回结果如下:
1{
2 "took": 1,
3 "timed_out": false,
4 "_shards": {
5 "total": 5,
6 "successful": 5,
7 "skipped": 0,
8 "failed": 0
9 },
10 "hits": {
11 "total": 4,
12 "max_score": null,
13 "hits": [
14 {
15 "_index": "my_join_index",
16 "_type": "_doc",
17 "_id": "1",
18 "_score": null,
19 "_source": {
20 "text": "This is a question",
21 "my_join_field": "question"
22 },
23 "sort": [
24 "1"
25 ]
26 },
27 {
28 "_index": "my_join_index",
29 "_type": "_doc",
30 "_id": "2",
31 "_score": null,
32 "_source": {
33 "text": "This is another question",
34 "my_join_field": "question"
35 },
36 "sort": [
37 "2"
38 ]
39 },
40 {
41 "_index": "my_join_index",
42 "_type": "_doc",
43 "_id": "3",
44 "_score": null,
45 "_routing": "1",
46 "_source": {
47 "text": "This is an answer",
48 "my_join_field": {
49 "name": "answer",
50 "parent": "1"
51 }
52 },
53 "sort": [
54 "3"
55 ]
56 },
57 {
58 "_index": "my_join_index",
59 "_type": "_doc",
60 "_id": "4",
61 "_score": null,
62 "_routing": "1",
63 "_source": {
64 "text": "This is another answer",
65 "my_join_field": {
66 "name": "answer",
67 "parent": "1"
68 }
69 },
70 "sort": [
71 "4"
72 ]
73 }
74 ]
75 }
76}
5.2 ES6.X 基于父文档查找子文档
1GET my_join_index/_search
2{
3 "query": {
4 "has_parent" : {
5 "parent_type" : "question",
6 "query" : {
7 "match" : {
8 "text" : "This is"
9 }
10 }
11 }
12 }
13}
返回结果:
1{
2 "took": 0,
3 "timed_out": false,
4 "_shards": {
5 "total": 5,
6 "successful": 5,
7 "skipped": 0,
8 "failed": 0
9 },
10 "hits": {
11 "total": 2,
12 "max_score": 1,
13 "hits": [
14 {
15 "_index": "my_join_index",
16 "_type": "_doc",
17 "_id": "3",
18 "_score": 1,
19 "_routing": "1",
20 "_source": {
21 "text": "This is an answer",
22 "my_join_field": {
23 "name": "answer",
24 "parent": "1"
25 }
26 }
27 },
28 {
29 "_index": "my_join_index",
30 "_type": "_doc",
31 "_id": "4",
32 "_score": 1,
33 "_routing": "1",
34 "_source": {
35 "text": "This is another answer",
36 "my_join_field": {
37 "name": "answer",
38 "parent": "1"
39 }
40 }
41 }
42 ]
43 }
44}
5.3 ES6.X 基于子文档查找父文档
1GET my_join_index/_search
2{
3"query": {
4 "has_child" : {
5 "type" : "answer",
6 "query" : {
7 "match" : {
8 "text" : "This is question"
9 }
10 }
11 }
12 }
13}
返回结果:
1{
2 "took": 0,
3 "timed_out": false,
4 "_shards": {
5 "total": 5,
6 "successful": 5,
7 "skipped": 0,
8 "failed": 0
9 },
10 "hits": {
11 "total": 1,
12 "max_score": 1,
13 "hits": [
14 {
15 "_index": "my_join_index",
16 "_type": "_doc",
17 "_id": "1",
18 "_score": 1,
19 "_source": {
20 "text": "This is a question",
21 "my_join_field": "question"
22 }
23 }
24 ]
25 }
26}
5.4 ES6.X Join聚合操作实战
以下操作含义如下:
-
1)parent_id是特定的检索方式,用于检索属于特定父文档id=1的,子文档类型为answer的文档的个数。
-
2)基于父文档类型question进行聚合;
- 3)基于指定的field处理。
1GET my_join_index/_search
2{
3 "query": {
4 "parent_id": {
5 "type": "answer",
6 "id": "1"
7 }
8 },
9 "aggs": {
10 "parents": {
11 "terms": {
12 "field": "my_join_field#question",
13 "size": 10
14 }
15 }
16 },
17 "script_fields": {
18 "parent": {
19 "script": {
20 "source": "doc[‘my_join_field#question‘]"
21 }
22 }
23 }
24}
返回结果:
1{
2 "took": 1,
3 "timed_out": false,
4 "_shards": {
5 "total": 5,
6 "successful": 5,
7 "skipped": 0,
8 "failed": 0
9 },
10 "hits": {
11 "total": 2,
12 "max_score": 0.13353139,
13 "hits": [
14 {
15 "_index": "my_join_index",
16 "_type": "_doc",
17 "_id": "3",
18 "_score": 0.13353139,
19 "_routing": "1",
20 "fields": {
21 "parent": [
22 "1"
23 ]
24 }
25 },
26 {
27 "_index": "my_join_index",
28 "_type": "_doc",
29 "_id": "4",
30 "_score": 0.13353139,
31 "_routing": "1",
32 "fields": {
33 "parent": [
34 "1"
35 ]
36 }
37 }
38 ]
39 },
40 "aggregations": {
41 "parents": {
42 "doc_count_error_upper_bound": 0,
43 "sum_other_doc_count": 0,
44 "buckets": [
45 {
46 "key": "1",
47 "doc_count": 2
48 }
49 ]
50 }
51 }
52}
6、ES6.X Join 一对多实战
6.1 一对多定义
如下,一个父文档question与多个子文档answer,comment的映射定义。
1PUT join_ext_index
2{
3 "mappings": {
4 "_doc": {
5 "properties": {
6 "my_join_field": {
7 "type": "join",
8 "relations": {
9 "question": ["answer", "comment"]
10 }
11 }
12 }
13 }
14 }
15}
6.2 一对多对多定义
实现如下图的祖孙三代关联关系的定义。
1question
2 / 3 / 4comment answer
5 |
6 |
7 vote
1PUT join_multi_index
2{
3 "mappings": {
4 "_doc": {
5 "properties": {
6 "my_join_field": {
7 "type": "join",
8 "relations": {
9 "question": ["answer", "comment"],
10 "answer": "vote"
11 }
12 }
13 }
14 }
15 }
16}
孙子文档导入数据,如下所示:
1PUT join_multi_index/_doc/3?routing=1&refresh
2{
3 "text": "This is a vote",
4 "my_join_field": {
5 "name": "vote",
6 "parent": "2"
7 }
8}
注意:
1- 孙子文档所在分片必须与其父母和祖父母相同
2- 孙子文档的父代号(必须指向其父亲answer文档)
7、小结
虽然ES官方文档已经很详细了,详见:
http://t.cn/RnBBLgp
但手敲一遍,翻译一遍,的的确确会更新认知,加深理解。
加入知识星球,更短时间习得更多干货!
以上是关于Elasticsearch 6.X 新类型Join深入详解的主要内容,如果未能解决你的问题,请参考以下文章
ElasticSearch 6.x 父子文档[join]分析