Elasticsearch:运用 Java 对索引文档进行搜索
Posted Elastic 中国社区官方博客
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Elasticsearch:运用 Java 对索引文档进行搜索相关的知识,希望对你有一定的参考价值。
这是这个系列文章中的其中一篇文章:
在今天的文章中,我将详述如何对索引进行搜索。在进行下面的练习之前,我们先使用 Kibana 创建如下的一个叫做 twitter 的索引:
PUT twitter
{
"mappings": {
"properties": {
"DOB": {
"type": "date"
},
"address": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "long"
},
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"location": {
"type": "geo_point"
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"province": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"uid": {
"type": "long"
},
"user": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
在上面,我们创建了一个叫做 twitter 的索引。如果你对上面命令还不是很清楚的话,请参阅我之前的文章 “开始使用 Elasticsearch (2)”。我们接着使用如下的命令来导入文档:
POST twitter/_bulk
{"index":{"_id":1}}
{"user":"双榆树-张三","DOB":"1992-08-03","message":"今儿天气不错啊,出去转转去","uid":1,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{"index":{"_id":2}}
{"user":"东城区-老刘","DOB":"1990-07-14","message":"出发,下一站云南!","uid":2,"age":32,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{"index":{"_id":3}}
{"user":"东城区-李四","DOB":"1997-09-23","message":"happy birthday!","uid":3,"age":25,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{"index":{"_id":4}}
{"user":"朝阳区-老贾","DOB":"1980-06-30","message":"123,gogogo","uid":4,"age":42,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{"index":{"_id":5}}
{"user":"朝阳区-老王","DOB":"1996-06-18","message":"Happy BirthDay My Friend!","uid":5,"age":26,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{"index":{"_id":6}}
{"user":"虹桥-老吴","DOB":"2000-04-05","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":7,"age":22,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}
请注意上面的 DOB 代表的是 date of birth,也就是生日。我们可以使用如下的命令来进行查看文档的数量:
GET twitter/_count
上面会显示 6 个文档。
创建 Java 应用对文档进行搜索
为了方便大家对代码的理解,我把最终的代码置于 github:https://github.com/liu-xiao-guo/ElasticsearchJava-search。你可以使用如下的命令来下载代码:
git clone https://github.com/liu-xiao-guo/ElasticsearchJava-search
创建 Java 项目
我们可以参考之前的文章:
用自己喜欢的 IDE 来创建一个最为基本的 Java 项目。这里就不再累述。关于如何创建和 Elasticsearch 之间的连接,请参考上面的两篇文章。在接下来的描述中,我将详细讲解如何使用代码来进行搜索。
搜索文档
搜素一:搜索所有的文档
我们使用 Java 来搜索所有的文档:
// Search 1: Search for all documents
System.out.println("****************** Search 1");
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices(INDEX_NAME);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
searchRequest.source(searchSourceBuilder);
Map<String, Object> map=null;
try {
SearchResponse searchResponse = null;
searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
if (searchResponse.getHits().getTotalHits().value > 0) {
SearchHit[] searchHit = searchResponse.getHits().getHits();
for (SearchHit hit : searchHit) {
map = hit.getSourceAsMap();
System.out.println("map:" + Arrays.toString(map.entrySet().toArray()));
}
}
} catch (IOException e) {
e.printStackTrace();
}
在上面,我们使用 QueryBuilders.matchAllQuery() 来查询所有的文档。上面的命令和 Kibana 中的如下的命令是一样的:
GET twitter/_search
运行上面的代码。它的运行结果是:
****************** Search 1
map:[uid=1, country=中国, address=中国北京市海淀区, province=北京, city=北京, DOB=1992-08-03, location={lon=116.325747, lat=39.970718}, message=今儿天气不错啊,出去转转去, user=双榆树-张三, age=30]
map:[uid=2, country=中国, address=中国北京市东城区台基厂三条3号, province=北京, city=北京, DOB=1990-07-14, location={lon=116.412754, lat=39.904313}, message=出发,下一站云南!, user=东城区-老刘, age=32]
map:[uid=3, country=中国, address=中国北京市东城区, province=北京, city=北京, DOB=1997-09-23, location={lon=116.408986, lat=39.893801}, message=happy birthday!, user=东城区-李四, age=25]
map:[uid=4, country=中国, address=中国北京市朝阳区建国门, province=北京, city=北京, DOB=1980-06-30, location={lon=116.367910, lat=39.718256}, message=123,gogogo, user=朝阳区-老贾, age=42]
map:[uid=5, country=中国, address=中国北京市朝阳区国贸, province=北京, city=北京, DOB=1996-06-18, location={lon=116.467910, lat=39.918256}, message=Happy BirthDay My Friend!, user=朝阳区-老王, age=26]
map:[uid=7, country=中国, address=中国上海市闵行区, province=上海, city=上海, DOB=2000-04-05, location={lon=121.383328, lat=31.175927}, message=好友来了都今天我生日,好友来了,什么 birthday happy 就成!, user=虹桥-老吴, age=22]
从上面的输出中,我们可以看出来:它搜索到所有的结果。
搜索二:搜索一定范围的数据
// Search 2:
System.out.println("****************** Search 2");
SearchSourceBuilder builder = new SearchSourceBuilder()
.postFilter(QueryBuilders.rangeQuery("age").from(25).to(30));
SearchRequest searchRequest2 = new SearchRequest();
searchRequest2.indices(INDEX_NAME);
searchRequest2.searchType(SearchType.DFS_QUERY_THEN_FETCH);
searchRequest2.source(builder);
try {
SearchResponse searchResponse = null;
searchResponse = client.search(searchRequest2, RequestOptions.DEFAULT);
if (searchResponse.getHits().getTotalHits().value > 0) {
SearchHit[] searchHit = searchResponse.getHits().getHits();
for (SearchHit hit : searchHit) {
map = hit.getSourceAsMap();
System.out.println("map:" + Arrays.toString(map.entrySet().toArray()));
}
}
} catch (IOException e) {
e.printStackTrace();
}
在上面,我们搜索年龄在 25 岁和 30 岁之间的所有文档。上面的命令类似于 Kibana 中的如下搜索:
GET twitter/_search
{
"query": {
"match_all": {}
},
"post_filter": {
"range": {
"age": {
"gte": 25,
"lte": 30
}
}
}
}
运行上面的应用,搜索二的输出结果为:
****************** Search 2
map:[uid=1, country=中国, address=中国北京市海淀区, province=北京, city=北京, DOB=1992-08-03, location={lon=116.325747, lat=39.970718}, message=今儿天气不错啊,出去转转去, user=双榆树-张三, age=30]
map:[uid=3, country=中国, address=中国北京市东城区, province=北京, city=北京, DOB=1997-09-23, location={lon=116.408986, lat=39.893801}, message=happy birthday!, user=东城区-李四, age=25]
map:[uid=5, country=中国, address=中国北京市朝阳区国贸, province=北京, city=北京, DOB=1996-06-18, location={lon=116.467910, lat=39.918256}, message=Happy BirthDay My Friend!, user=朝阳区-老王, age=26]
从上面的结果中可以看出来 age 在 25 岁和 30 岁之间的文档有 3 个。
搜索三:在字段中进行全文搜索
// Search 3:
System.out.println("****************** Search 3");
SearchSourceBuilder builder3 = new SearchSourceBuilder();
builder3.from(0);
builder3.size(2);
builder3.timeout(new TimeValue(60, TimeUnit.SECONDS));
builder3.query(QueryBuilders.matchQuery("user", "朝阳"));
SearchRequest searchRequest3 = new SearchRequest();
searchRequest3.indices(INDEX_NAME);
searchRequest3.searchType(SearchType.DFS_QUERY_THEN_FETCH);
searchRequest3.source(builder3);
try {
SearchResponse searchResponse = null;
searchResponse = client.search(searchRequest3, RequestOptions.DEFAULT);
if (searchResponse.getHits().getTotalHits().value > 0) {
SearchHit[] searchHit = searchResponse.getHits().getHits();
for (SearchHit hit : searchHit) {
map = hit.getSourceAsMap();
System.out.println("map:" + Arrays.toString(map.entrySet().toArray()));
}
}
} catch (IOException e) {
e.printStackTrace();
}
我们在所有的文档里搜索字段 user 含有 “朝阳”,并返回第一个 page 的结果。上述搜索相当于在 Kibana 中的如下命令:
GET twitter/_search
{
"from": 0,
"size": 2,
"query": {
"match": {
"user": "朝阳"
}
}
}
运行上面的代码。它的显示结果为:
****************** Search 3
map:[uid=4, country=中国, address=中国北京市朝阳区建国门, province=北京, city=北京, DOB=1980-06-30, location={lon=116.367910, lat=39.718256}, message=123,gogogo, user=朝阳区-老贾, age=42]
map:[uid=5, country=中国, address=中国北京市朝阳区国贸, province=北京, city=北京, DOB=1996-06-18, location={lon=116.467910, lat=39.918256}, message=Happy BirthDay My Friend!, user=朝阳区-老王, age=26]
上面的结果显示 user 字段含有 “朝阳”,并且它的文档数是 2,也就是 page size 是 2。
搜索四:复合查询
在很多的时候,我们使用复合查询来得到所需要的文档。关于复合查询的理解,请参阅我之前的文章 “开始使用 Elasticsearch (2)”。它一般具有如下的一个形式:
POST _search
{
"query": {
"bool" : {
"must" : {
"term" : { "user" : "kimchy" }
},
"filter": {
"term" : { "tag" : "tech" }
},
"must_not" : {
"range" : {
"age" : { "gte" : 10, "lte" : 20 }
}
},
"should" : [
{ "term" : { "tag" : "wow" } },
{ "term" : { "tag" : "elasticsearch" } }
],
"minimum_should_match" : 1,
"boost" : 1.0
}
}
}
它由 must,must_not 及 should 组成的布尔查询。
// Search 4:
System.out.println("****************** Search 4");
MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("user", "朝阳");
MatchQueryBuilder matchQueryBuilder1 = new MatchQueryBuilder("address", "北京");
RangeQueryBuilder rangeQueryBuilder = new RangeQueryBuilder("age").from(25).to(30);
BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder()
.must(matchQueryBuilder)
.must(matchQueryBuilder1)
.should(rangeQueryBuilder);
SearchSourceBuilder builder4 = new SearchSourceBuilder().query(boolQueryBuilder);
builder4.from(0);
builder4.size(2);
builder4.timeout(new TimeValue(60, TimeUnit.SECONDS));
searchRequest4.source(builder4);
SearchRequest searchRequest4 = new SearchRequest();
searchRequest4.indices(INDEX_NAME);
searchRequest4.searchType(SearchType.DFS_QUERY_THEN_FETCH);
searchRequest4.source(builder4);
try {
SearchResponse searchResponse = null;
searchResponse = client.search(searchRequest4, RequestOptions.DEFAULT);
if (searchResponse.getHits().getTotalHits().value > 0) {
SearchHit[] searchHit = searchResponse.getHits().getHits();
for (SearchHit hit : searchHit) {
map = hit.getSourceAsMap();
System.out.println("map:" + Arrays.toString(map.entrySet().toArray()));
}
}
} catch (IOException e) {
e.printStackTrace();
}
在上面,我们使用 must 及 should 组成的 bool 查询。它相当于在 Kibana 中的如下命令:
GET twitter/_search
{
"from": 0,
"size": 2,
"query": {
"bool": {
"must": [
{
"match": {
"user": "朝阳"
}
},
{
"match": {
"address": "北京"
}
}
],
"should": [
{
"range": {
"age": {
"gte": 25,
"lte": 30
}
}
}
]
}
},
"sort": [
{
"DOB": {
"order": "asc"
}
}
]
}
在 Kibana 中运行上面的命令:
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"user" : "朝阳区-老贾",
"DOB" : "1980-06-30",
"message" : "123,gogogo",
"uid" : 4,
"age" : 42,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区建国门",
"location" : {
"lat" : "39.718256",
"lon" : "116.367910"
}
},
"sort" : [
331171200000
]
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "5",
"_score" : null,
"_source" : {
"user" : "朝阳区-老王",
"DOB" : "1996-06-18",
"message" : "Happy BirthDay My Friend!",
"uid" : 5,
"age" : 26,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区国贸",
"location" : {
"lat" : "39.918256",
"lon" : "116.467910"
}
},
"sort" : [
835056000000
]
}
]
}
}
我们可以看到是按照 DOB 进行排序的。
运行我们的代码:
****************** Search 4
map:[uid=4, country=中国, address=中国北京市朝阳区建国门, province=北京, city=北京, DOB=1980-06-30, location={lon=116.367910, lat=39.718256}, message=123,gogogo, user=朝阳区-老贾, age=42]
map:[uid=5, country=中国, address=中国北京市朝阳区国贸, province=北京, city=北京, DOB=1996-06-18, location={lon=116.467910, lat=39.918256}, message=Happy BirthDay My Friend!, user=朝阳区-老王, age=26]
在返回结果中,也是按照 DOB 降序来排列的。
也许有的同学要问,为啥 age 为 42 的文档 4 被搜索到了啊?这个就是 should 的作用。如果在 should 里的条件满足,那么搜索的结果就会加分。当然由于我们使用 sort 进行重新排序,所以得到的分数没有任何的意义。
以上是关于Elasticsearch:运用 Java 对索引文档进行搜索的主要内容,如果未能解决你的问题,请参考以下文章
Elasticsearch:运用 Java 对索引进行 nested 搜索
Elasticsearch:运用 Java 创建索引并写入数据
Elasticsearch:运用scroll接口对大量数据实现更好的分页
Elasticsearch:运用 doc-value-only 字段来实现更快的索引速度并节省空间 - Elastic Stack 8.1
Elasticsearch:运用 doc-value-only 字段来实现更快的索引速度并节省空间 - Elastic Stack 8.1