自动补全实现
Posted 十一vs十一
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了自动补全实现相关的知识,希望对你有一定的参考价值。
服务(应用)名称
Docker容器地址
端口
查询服务
172.188.0.33
8888
分析服务
172.188.0.34
5555
网关服务
172.188.0.22
6666
logstash
172.188.0.77
9600
kinana
172.188.0.66
5601
msyql
172.188.0.55
33066
nacos1
nacos:8848
8848-->8848
nacos2
nacos:8848
8848-->8849
nacos3
nacos:8848
8848-->8850
nacos-server-mysql
mysql:3306
3306
prometheus
172.188.0.11
9090
grafana
172.188.0.12
3000
eNode1
172.188.0.88
9200 / 9700
eNode2
172.188.0.89
9201 / 9800
eNode3
172.188.0.90
9202 / 9900
日志埋点与落盘
172.188.0.77
TCP 4567
链路追踪
172.188.0.99
9411
1 语言处理与自动补全技术探测
实现的效果
实现的最终效果如下图京东搜索相似,输入词的时候返回提示。同时输入拼音和首字母也会有相同的提示
效果
输入汉字
输入拼音
输入首字母
2.1 自定义语料库
2.1.1 语料库映射OpenAPI
索引映射OpenAPI
1、定义索引(映射)接口
2、定义索引(映射)实现
package com.itheima.service;
import com.it
he
im
a
.
co
m
m
o
n
s
.p
o
j
o
.C
o
m
m
o
n
En
t
ity;
import org.el
a
st
i
cs
e
a
r
c
h
.
re
s
t
.
Re
s
t
S
t
at
u
s
;
import jav
a
.
u
t
i
l
.
L
i
st
;
import jav
a
.
u
t
i
l
.M
a
p
;
/**
* @Class: ElasticsearchIndexService
* @Package com.itheima.service
* @Description: 索引操作接口
* @Company: http://www.itheima.com/
*/
public interface ElasticsearchIndexService
//新增索引+映射
boolean addIndexAndMapping(CommonEntity commonEntity) throws Exception;
/*
* @Description: 新增索引+setting+映射+自定义分词器pinyin
* setting可以为空(自定义分词器pinyin在setting中)
* 映射可以为空
* @Method: addIndexAndMapping
* @Param: [commonEntity]
* @Update:
* @since: 1.0.0
* @Return: boolean
*
*/
public boolean addIndexAndMapping(CommonEntity commonEntity) throws
Exception
//设置setting的map
Map<String, Object> settingMap = new HashMap<String, Object>();
//创建索引请求
CreateIndexRequest request = new
CreateIndexRequest(commonEntity.getIndexName());
//获取前端参数
Map<String, Object> map = commonEntity.getMap();
//循环外层的settings和mapping
for (Map.Entry<String, Object> entry : map.entrySet())
if ("settings".equals(entry.getKey()))
if (entry.getValue() instanceof Map && ((Map)
entry.getValue()).size() > 0)
request.settings((Map<String, Object>) entry.getValue());
3、新增控制器
if ("mapping".equals(entry.getKey()))
if (entry.getValue() instanceof Map && ((Map)
entry.getValue()).size() > 0)
request.mapping((Map<String, Object>) entry.getValue());
//创建
索引操作客户端
I
n
di
ce
s
Cl
ie
nt indices = client.indices();
//
创
建
响
应
对
象
C
r
e
a
t
e
I
nd
e
x
Re
s
po
nse response = indices.create(request,
RequestO
pt
io
n
s
.
D
E
FA
U
LT
)
;
//
得
到
响
应
结
果
return response.isAcknowledged();
/*
* @Description: 新增索引、映射
* @Method: addIndex
* @Param: [commonEntity]java
* @Update:
* @since: 1.0.0
* @Return: com.itheima.commons.result.ResponseData
*
*/
@PostMapping(value = "/add")
public ResponseData addIndexAndMapping(@RequestBody CommonEntity
commonEntity)
//构造返回数据
ResponseData rData = new ResponseData();
if (StringUtils.isEmpty(commonEntity.getIndexName()))
rData.setResultEnum(ResultEnum.param_isnull);
return rData;
//增加索引是否成功
boolean isSuccess = false;
try
//通过高阶API调用增加索引方法
isSuccess =
elasticsearchIndexService.addIndexAndMapping(commonEntity );
//构建返回信息通过类型推断自动装箱(多个参数取交集)
rData.setResultEnum(isSuccess, ResultEnum.success, 1);
//日志记录
logger.info(TipsEnum.create_index_success.getMessage());
catch (Exception e)
//打印到控制台
e.printStackTrace();
//日志记录
logger.error(TipsEnum.create_index_fail.getMessage(), e);
//构建错误返回信息
rData.setResultEnum(ResultEnum.error);
4.开始新增映射
参数
自定义分词器ik_pinyin_analyzer(
ik和pinyin组合分词器)
tips
在创建映射前,需要安装拼音插件
return rData;
http://172.17.0.225:8888/v1/indices/add
或者
http://127.0.
0.1:8888/v1/indices/add
"indexName": "product_completion_index",
"map":
"settings":
"number_of_shards": 1,
"number_of_replicas": 2,
"analysis":
"analyzer":
"ik_pinyin_analyzer":
"type": "custom",
"tokenizer": "ik_smart",
"filter": "pinyin_filter"
,
"filter":
"pinyin_filter":
"type": "pinyin",
"keep_first_letter": true,
"keep_separate_first_letter": false,
"keep_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"lowercase": true,
"remove_duplicated_term": true
,
"mapping":
"properties":
"name":
"type": "keyword"
,
"searchkey":
"type": "completion",
"analyzer": "ik_pinyin_analyzer"
属性
说明
keep_fifirst_letter
启用此选项时,例如:刘德华> ldh,默认值:
true
keep_separate_fifirst_letter
启用该选项时,将保留第一个字母分开,例如:
刘德华> l,d,h,默认:假的,注意:查询结果
也许是太模糊,由于长期过频
limit_fifirst_letter_length
设置fifirst_letter结果的最大长度,默认值:16
settings下面的为索引的设置信息,动态设置参数,遵循DSL写法
mapping下为映射
的字段信息,动态设置参数,遵循DSL写法
属性
说明
keep_full_pinyin
当启用该选项,例如:刘德华> [ liu,de,
hua],默认值:true
keep_joined_full_pinyin
当启用此选项时,例如:刘德华> [ liudehua],
默认值:false
keep_none_ch
inese
在
结
果中
保
留
非
中文
字
母
或
数
字
,
默
认
值
:
tr
u
e
keep_none
_chinese_together
默
认值
:t
ru
e
,如
:D
J音
乐
家
-
>
D
J,
y
in,
y
u
e
,
j
ia
,
当
设
置
为
f
al
se
,例
如
:
D
J音
乐
家
-
>
D
,
J
,
y
in
,
yu
e
,j
ia
,
注
意
:k
ee
p
_n
o
ne
_c
h
in
e
se
必
须先启动
keep_none_chinese_in_fifirst_letter
第一个字母保持非中文字母,例如:刘德华
AT2016- > ldhat2016,默认值:true
keep_none_chinese_in_joined_full_pinyin
保留非中文字母加入完整拼音,例如:刘德华
2016- > liudehua2016,默认:false
none_chinese_pinyin_tokenize
打破非中国信成单独的拼音项,如果他们拼音,
默认值:true,如:
liudehuaaɹ ibaba13zhuanghan- > liu,de,
hua,a,li,ba,ba,13,zhuang,han,注
意:keep_none_chinese和
keep_none_chinese_together应首先启用
keep_original
当启用此选项时,也会保留原始输入,默认值:
false
lowercase
小写非中文字母,默认值:true
trim_whitespace
默认值:true
remove_duplicated_term
当启用此选项时,将删除重复项以保存索引,例
如:de的> de,默认值:false,注意:位置相关
查询可能受影响
返回
2.1.2 语料库文档OpenAPI
1、定义批量新增文档接口
"code": "200",
"desc": "操作成功!",
"data": true
package com.itheima.service;
2、定义批量新增文档实现
import com.itheima.commons.pojo.CommonEntity;
import org.elasticsearch.action.DocWriteResponse;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.rest.RestStatus;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.suggest.completion.CompletionSuggestion;
import java.util.List;
import java.u
til.Map;
/**
* @Class
: ElasticsearchDocumentService
* @Pack
a
g
e
c
o
m
.
i
th
ei
m
a.
se
rvice
* @Desc
r
i
p
t
i
o
n
:
文
档
操
作
接
口
* @Company: http://www.itheima.com/
*/
public interface ElasticsearchDocumentService
//批量新增文档
public RestStatus bulkAddDoc(CommonEntity commonEntity) throws Exception;
/*
* @Description: 批量新增文档,可自动创建索引、自动创建映射
* @Method: bulkAddDoc
* @Param: [indexName, map]
* @Update:
* @since: 1.0.0
* @Return: org.elasticsearch.rest.RestStatus
*
*/
@Override
public RestStatus bulkAddDoc(CommonEntity commonEntity) throws Exception
//通过索引构建批量请求对象
BulkRequest bulkRequest = new BulkRequest(commonEntity.getIndexName());
//循环前台list文档数据
for (int i = 0; i < commonEntity.getList().size(); i++)
bulkRequest.add(new IndexRequest().source(XContentType.JSON,
SearchTools.mapToObjectGroup(commonEntity.getList().get(i))));
//执行批量新增
BulkResponse bulkResponse = client.bulk(bulkRequest,
RequestOptions.DEFAULT);
return bulkResponse.status();
官方文档介绍https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.4/java-rest-high
-document-bulk.html
如上图,需要定义成箭头中的形式
所以上面SearchTools.mapToObjectGroup将map转成了数组
3、定义批量新增文档控制器
/*
* @Description: 批量新增文档,可自动创建索引、自动创建映射
* @Method: bulkAddDoc
* @Param: [indexName, map]
* @Update:
* @since: 1.0.0
* @Return: org.elasticsearch.rest.RestStatus
*
*/
@PostMapping(value = "/batch")
public ResponseData bulkAddDoc(@RequestBody CommonEntity commonEntity)
//构造返回数据
ResponseData rData = new ResponseData();
if (StringUtils.isEmpty(commonEntity.getIndexName()) ||
CollectionUtils.isEmpty(commonEntity.getList()))
rData.setResultEnum(ResultEnum.param_isnull);
return rData;
4、开始批量新增调用
参数
定义23个suggest词库(定义了两个小米手机,验证是否去重)
tips
学完聚合通过日志埋点、数据落盘进行维护
//批量新增操作返回结果
RestStatus result = null;
try
//通过高阶API调用批量新增操作方法
result = elasticsearchDocumentService.bulkAddDoc(commonEntity);
//通过类型推断自动装箱(多个参数取交集)
rData.setResultEnum(result, ResultEnum.success, null);
//日志记录
logger.info(TipsEnum.batch_create_doc_success.getMessage());
cat
c
h
(E
x
ce
pt
ion e)
/
/
打
印
到
控
制
台
e
.p
r
in
t
StackTrace();
/
/
日
志
记
录
l
o
gg
er
.
in
fo
(T
i
psEnum.batch_create_doc_fail.getMessage());
/
/
构
建
错
误
返
回
信
息
rData.setResultEnum(ResultEnum.error);
return rData;
http://172.17.0.225:8888/v1/docs/batch
或者
http://127.0.0.1:8888/v1/docs/batch
"indexName": "product_completion_index",
"list": [
"searchkey": "小米手机",
"name": "小米(MI)"
,
"searchkey": "小米10",
"name": "小米(MI)"
,
"searchkey": "小米电视",
"name": "小米(MI)"
,
"searchkey": "小米路由器",
"name": "小米(MI)"
,
"searchkey": "小米9",
"name": "小米(MI)"
,
"searchkey": "小米手机",
"name": "小米(MI)"
,
"searchkey": "小米耳环",
"
name": "小米(MI)"
,
"searchkey": "小米8",
"name": "小米(MI)"
,
"searchkey": "小米10Pro",
"name": "小米(MI)"
,
"searchkey": "小米笔记本",
"name": "小米(MI)"
,
"searchkey": "小米摄像头",
"name": "小米(MI)"
,
"searchkey": "小米电饭煲",
"name": "小米(MI)"
,
"searchkey": "小米充电宝",
"name": "小米(MI)"
,
"searchkey": "adidas男鞋",
"name": "adidas男鞋"
,
"searchkey": "adidas女鞋",
"name": "adidas女鞋"
,
"searchkey": "adidas外套",
"name": "adidas外套"
,
"searchkey": "adidas裤子",
"name": "adidas裤子"
,
"searchkey": "adidas官方旗舰店",
"name": "adidas官方旗舰店"
,
"searchkey": "阿迪达斯袜子",
返回
查看
2.2产品搜索与自动补全
"name": "阿迪达斯袜子"
,
"searchkey": "阿迪达斯外套",
"name": "阿迪达斯外套"
,
"searchkey": "阿迪达斯运动鞋",
"
name": "阿迪达斯运动鞋"
,
"searchkey": "耐克外套",
"name": "耐克外套"
,
"searchkey": "耐克运动鞋",
"name": "耐克运动鞋"
]
"code": "200",
"desc": "操作成功!",
"data": "OK"
GET product_completion_index/_search
Term suggester :词条建议器。对给输入的文本进进行分词,为每个分词提供词项建议
Phrase suggester :短语建议器,在term的基础上,会考量多个term之间的关系
Completion Suggester,它主要针对的应用场景就是"Auto Completion"。
Context Suggester:上下文建议器
2.2.1 汉字补全OpenAPI
1、定义自动补全接口
GET produc
t_completion_index/_search
"from"
:
0
,
"size"
:
10
0,
"suggest":
"czbk-suggest":
"prefix": "小米",
"completion":
"field": "searchkey",
"size": 20,
"skip_duplicates": true
package com.itheima.service;
import com.itheima.commons.pojo.CommonEntity;
import org.elasticsearch.action.DocWriteResponse;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.rest.RestStatus;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.suggest.completion.CompletionSuggestion;
import java.util.List;
import java.util.Map;
/**
* @Class: ElasticsearchDocumentService
* @Package com.itheima.service
* @Description: 文档操作接口
* @Company: http://www.itheima.com/
*/
public interface ElasticsearchDocumentService
//自动补全(完成建议)
public List<String> cSuggest(CommonEntity commonEntity) throws Exception;
2、定义自动补全实现
/*
* @Description: 自动补全 根据用户的输入联想到可能的词或者短语
* @Me
t
h
od
:
s
u
g
g
e
st
e
r
* @Pa
ra
m
:
[c
o
m
m
o
n
E
nt
ity]
* @
U
pd
a
t
e
:
* @
si
n
c
e
:
1.0.0
* @Return: org.elasticsearch.action.search.SearchResponse
*
*/
public List<String> cSuggest(CommonEntity commonEntity) throws Exception
//定义返回
List<String> suggestList = new ArrayList<>();
//构造搜索建议语句,搜索条件字段
CompletionSuggestionBuilder completionSuggestionBuilder =
SuggestBuilders.completionSuggestion(commonEntity.getSuggestFileld());
//搜索关键字
completionSuggestionBuilder.prefix(commonEntity.getSuggestValue());
//去除重复
completionSuggestionBuilder.skipDuplicates(true);
//匹配数量
completionSuggestionBuilder.size(commonEntity.getSuggestCount());
//czbk-suggest为返回的字段,所有返回将在czbk-suggest里面,可写死,sort按照评分排
序
SearchRequest searchRequest = new
SearchRequest().indices(commonEntity.getIndexName()).source(new
SearchSourceBuilder().sort(new
ScoreSortBuilder().order(SortOrder.DESC)).suggest(
new SuggestBuilder().addSuggestion("czbk-suggest",
completionSuggestionBuilder)
));
//定义查找响应
SearchResponse suggestResponse = client.search(searchRequest,
RequestOptions.DEFAULT);
//定义完成建议对象
CompletionSuggestion completionSuggestion =
suggestResponse.getSuggest().getSuggestion("czbk-suggest");
//获取返回数据
List<CompletionSuggestion.Entry.Option> optionsList =
completionSuggestion.getEntries().get(0).getOptions();
//从optionsList取出结果
if (!CollectionUtils.isEmpty(optionsList))
optionsList.forEach(item ->
suggestList.add(item.getText().string());
);
return suggestList;
3、定义自动补全控制器
4、自动补全调用验证
参数
/*
* @Description 自动补全
* @Metho
d
:
s
u
g
g
e
st
e
r
* @Param
:
[c
o
m
m
o
n
E
nt
ity]
* @Update:
* @s
i
nc
e
:
1
.
0
.0
* @Re
t
u
rn
:
c
o
m
.itheima.commons.result.ResponseData
*
*/
@GetMapping(value = "/csuggest")
public ResponseData cSuggest(@RequestBody CommonEntity commonEntity)
// 构造返回数据
ResponseData rData = new ResponseData();
if (StringUtils.isEmpty(commonEntity.getIndexName()) ||
StringUtils.isEmpty(commonEntity.getSuggestFileld()) ||
StringUtils.isEmpty(commonEntity.getSuggestValue()))
rData.setResultEnum(ResultEnum.param_isnull);
return rData;
//批量查询返回结果
List<String> result = null;
try
//通过高阶API调用批量新增操作方法
result = elasticsearchDocumentService.cSuggest(commonEntity);
//通过类型推断自动装箱(多个参数取交集)
rData.setResultEnum(result, ResultEnum.success, result.size());
//日志记录
logger.info(TipsEnum.csuggest_get_doc_success.getMessage());
catch (Exception e)
//打印到控制台
e.printStackTrace();
//日志记录
logger.error(TipsEnum.csuggest_get_doc_fail.getMessage());
//构建错误返回信息
rData.setResultEnum(ResultEnum.error);
return rData;
http://172.7.0.225:6666/v1/docs/csuggest
indexName索引名称
suggestFileld:自
动补全查找列
suggestValue
:自动补全输入的关键字
suggestCount:自动补全返回个数(京东是13个)
返回
自动补全自动去重
2.2.2 拼音补全OpenAPI
使用拼音访问【小米】
"indexName": "product_completion_index",
"suggestFileld": "searchkey",
"suggestValue": "小米",
"suggestCount": 13
"code": "200",
"desc": "操作成功!",
"data": [
"小米10",
"小米10Pro",
"小米8",
"小米9",
"小米充电宝",
"小米手机",
"小米摄像头",
"小米电视",
"小米电饭煲",
"小米笔记本",
"小米耳环",
"小米路由器"
],
"count": 12
http://localhost:8888/v1/docs/csuggest
全拼访问
"indexName": "product_completion_index",
"suggestFileld": "searchkey",
"suggestValue": "xiaomi",
"suggestCount": 13
1、下载拼插件
当我们创建索引时可以自定义分词器,通过指定映射去匹配自定义分词器
全拼访问(分隔)
"indexName": "product_completion_index",
"suggestFileld": "searchkey",
"suggestValue": "xiao mi",
"suggestCount": 13
首字母访问
"index
Name": "product_completion_index",
"sug
g
e
s
t
F
il
e
ld
"
:
"
se
a
rchkey",
"sug
g
e
s
tV
a
l
ue
"
:
"x
m
"
,
"suggestCount": 13
wget https://github.com/medcl/elasticsearch-analysis
pinyin/releases/download/v7.4.0/elasticsearch-analysis-pinyin-7.4.0.zip
或者
https://github.com/medcl/elasticsearch-analysis-pinyin/releases/tag/v7.4.0
调用【新增文档开发API】接口进行新增数据
开始拼音补全
"indexName": "product_completion_index",
"map":
"settings":
"number_of_shards": 1,
"number_of_replicas": 2,
"analysis":
"analyzer":
"
i
k_
p
i
ny
in
_
an
a
l
y
z
e
r"
:
"t
y
pe
"
:
"
c
u
st
o
m
"
,
"tokenizer": "ik_smart",
"filter": "pinyin_filter"
,
"f
ilter":
"pinyin_filter":
"type": "pinyin",
"keep_first_letter": true,
"keep_separate_first_letter": false,
"keep_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"lowercase": true,
"remove_duplicated_term": true
,
"mapping":
"properties":
"name":
"type": "text"
,
"searchkey":
"type": "completion",
"analyzer": "ik_pinyin_analyzer"
2.3 产品搜索与语言处理
2.3.1 什么是语言处理(拼写纠错)
场景描述
例如:错误输入"【adidaas官方旗舰店】 ”能够纠错为【adidas官方旗舰店】
2.3.2 语
言处理OpenAPI
1、定义拼写纠错接口
2、定义拼写纠错实现
GET product_completion_index/_search
"suggest":
"czbk-suggestion":
"text": "adidaas官方旗舰店",
"phrase":
"field": "name",
"size": 13
//拼写纠错
public String pSuggest(CommonEntity commonEntity) throws Exception;
/*
* @Description: 拼写纠错
* @Method: psuggest
* @Param: [commonEntity]
* @Update:
* @since: 1.0.0
* @Return: java.util.List<java.lang.String>
*
*/
@Override
public String pSuggest(CommonEntity commonEntity) throws Exception
//定义返回
String pSuggestString = new String();
//构造短语建议器对象(参数为匹配列)
3、定义拼写纠错控制器
PhraseSuggestionBuilder pSuggestionBuilder = new
PhraseSuggestionBuilder(commonEntity.getSuggestFileld());
//搜索关键字(被纠错的值)
pSuggestionBuilder.text(commonEntity.getSuggestValue());
//匹配数量
pSuggestionBuilder.size(1);
//czbk-suggest为返回的字段,所有返回将在czbk-suggest里面,可写死,sort按照评分排
序
SearchRequest searchRequest = new
SearchRequest
()
.
i
nd
i
ce
s
(c
o
m
m
o
n
Entity.getIndexName()).source(new
SearchSourceB
u
il
d
e
r
()
.
so
r
t
(
n
ew
ScoreSort
B
u
i
l
d
e
r
(
)
.
o
rd
e
r
(S
o
rt
O
rd
e
r
.D
E
SC
))
.
s
ug
g
e
s
t
(
n
e
w
S
ug
g
e
s
tB
u
il
d
e
r(
)
.a
dd
Su
g
g
e
st
i
o
n
("czbk-suggest",
pSuggest
io
n
Builder)
)
)
;
//定义查找响应
SearchResponse suggestResponse = client.search(searchRequest,
RequestOptions.DEFAULT);
//定义短语建议对象
PhraseSuggestion phraseSuggestion =
suggestResponse.getSuggest().getSuggestion("czbk-suggest");
//获取返回数据
List<PhraseSuggestion.Entry.Option> optionsList =
phraseSuggestion.getEntries().get(0).getOptions();
//从optionsList取出结果
if (!CollectionUtils.isEmpty(optionsList))
pSuggestString = optionsList.get(0).getText().string();
return pSuggestString;
/*
* @Description: 拼写纠错
* @Method: suggester2
* @Param: [commonEntity]
* @Update:
* @since: 1.0.0
* @Return: com.itheima.commons.result.ResponseData
*
*/
@GetMapping(value = "/psuggest")
public ResponseData pSuggest(@RequestBody CommonEntity commonEntity)
// 构造返回数据
ResponseData rData = new ResponseData();
if (StringUtils.isEmpty(commonEntity.getIndexName()) ||
StringUtils.isEmpty(commonEntity.getSuggestFileld()) ||
StringUtils.isEmpty(commonEntity.getSuggestValue()))
rData.setResultEnum(ResultEnum.param_isnull);
return rData;
//批量查询返回结果
String result = null;
4、语言处理调用验证
参数
indexName索引名称
suggestFileld:自动补全查找列
suggestValue:自动补全输入的关键字
返回
2.4 总结
1. 需要一个搜索词库/语料库,不要和业务索引库在一起,方便维护和升级语料库
2. 根据分词及其他搜索条件去语料库中查询若干条(京东13条、淘宝(天猫)10条、百度4条)记录
返回
3. 为了提升准确率,通常都是前缀搜索
try
//通过高阶API调用批量新增操作方法
result = elasticsearchDocumentService.pSuggest(commonEntity);
//通过类型推断自动装箱(多个参数取交集)
rData.setResultEnum(result, ResultEnum.success, null);
//日志记录
logger.info(TipsEnum.psuggest_get_doc_success.getMessage());
catch (Exception e)
//打印到控制台
e
.p
r
in
t
StackTrace();
/
/
日
志
记
录
l
o
gg
er
.
er
ro
r
(T
ipsEnum.psuggest_get_doc_fail.getMessage());
/
/
构
建
错
误
返
回
信
息
rData.setResultEnum(ResultEnum.error);
return rData;
http://172.17.0.225:6666/v1/docs/psuggest
"indexName": "product_completion_index",
"suggestFileld": "name",
"suggestValue": "adidaas官方旗舰店"
"code": "200",
"desc": "操作成功!",
"data": "adidas官方旗舰店"
2 电商平台产品推荐
2.1 什么是搜索推荐
例如:关键词输入【阿迪达斯 耐克 外套 运动鞋 袜子】
2.2 产品推荐OpenAPI
注意的地方,查看官网
https://www.elastic.co/guide/en/elasticsearch/reference/7.4/search-suggesters.html#te
rm-suggester
汪~没有找到与“阿迪达斯 耐克 外套 运动鞋 袜子”相关的商品,为您推荐“ 阿迪达斯耐克运动鞋”的相关商
品,或者试试:
GET product_completion_index/_search
"suggest":
"czbk-suggestion":
"text": "阿迪达斯 耐克 外套 运动鞋 袜子",
"term":
"field": "name",
"min_word_length": 2,
"string_distance": "ngram",
"analyzer": "ik_smart"
1、定义搜索推荐接口
package com.itheima.service;
import com.itheima.commons.pojo.CommonEntity;
import org.elasticsearch.action.DocWriteResponse;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.rest.RestStatus;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.suggest.completion.CompletionSuggestion;
import java.util.List;
import java.util.Map;
/**
* @Class: ElasticsearchDocumentService
* @Package com.itheima.service
* @Description: 文档操作接口2、定义搜索推
荐实现
* @Company: http://www.itheima.com/
*/
public interface ElasticsearchDocumentService
//搜索推荐(当输入的关键词过多的时候系统进行推荐)
public String tSuggest(CommonEntity commonEntity) throws Exception;
/*
* @
Description: 搜索推荐(当输入的关键词过多的时候系统进行推荐)
* @Method: tSuggest
* @Param: [commonEntity]
* @Update:
* @since: 1.0.0
* @Return: java.util.List<java.lang.String>
*
*/
public String tSuggest(CommonEntity commonEntity) throws Exception
//定义返回
String tSuggestString = new String();
//构造词条建议语句,搜索条件字段
TermSuggestionBuilder termSuggestiontextBuilder =
SuggestBuilders.termSuggestion(commonEntity.getSuggestFileld());
//搜索关键字
termSuggestiontextBuilder.text(commonEntity.getSuggestValue());
//匹配数量
// termSuggestiontextBuilder.size(commonEntity.getSuggestCount());
//输入的建议词分词
termSuggestiontextBuilder.analyzer("ik_smart");
//建议文本术语必须包含的最小长度。默认值为4。(旧名称“ min_word_len”已弃用)
termSuggestiontextBuilder.minWordLength(2);
//用于比较建议术语的相似程度的字符串距离实现
termSuggestiontextBuilder.stringDistance(TermSuggestionBuilder.StringDistanceImp
l.NGRAM);
//czbk-suggest为返回的字段,所有返回将在czbk-suggest里面,可写死,sort按照评分排
序
SearchRequest searchRequest = new
SearchRequest().indices(commonEntity.getIndexName()).source(new
SearchSourceBuilder().sort(new
ScoreSortBuilder().order(SortOrder.DESC)).suggest(
new SuggestBuilder().addSuggestion("czbk-suggest",
termSuggestiontextBuilder)
));
//定义查找响应
SearchResponse suggestResponse = client.search(searchRequest,
RequestOptions.DEFAULT);
//定义完成建议对象
TermSuggestion termSuggestion =
suggestResponse.getSuggest().getSuggestion("czbk-suggest");
//获取返回数据
3、定义搜索推荐控
制器
4、语言处理调用验证
参数
List<TermSuggestion.Entry.Option> optionsList =
termSuggestion.getEntries().get(0).getOptions();
//从optionsList取出结果
if (!CollectionUtils.isEmpty(optionsList))
tSuggestString = optionsList.get(0).getText().toString();
return tSuggestString;
/*
* @Description: 搜索推荐(当输入的关键词过多的时候系统进行推荐)
* @
M
et
h
od
:
t
S
u
g
ge
s
t
* @
Pa
ra
m
:
[
co
m
m
o
n
E
ntity]
* @Update:
* @since: 1.0.0
* @Return: com.itheima.commons.result.ResponseData
*
*/
@GetMapping(value = "/tsuggest")
public ResponseData tSuggest(@RequestBody CommonEntity commonEntity)
// 构造返回数据
ResponseData rData = new ResponseData();
if (StringUtils.isEmpty(commonEntity.getIndexName()) ||
StringUtils.isEmpty(commonEntity.getSuggestFileld()) ||
StringUtils.isEmpty(commonEntity.getSuggestValue()))
rData.setResultEnum(ResultEnum.param_isnull);
return rData;
//批量查询返回结果
String result = null;
try
//通过高阶API调用批量新增操作方法
result = elasticsearchDocumentService.tSuggest(commonEntity);
//通过类型推断自动装箱(多个参数取交集)
rData.setResultEnum(result, ResultEnum.success, null);
//日志记录
logger.info(TipsEnum.tsuggest_get_doc_success.getMessage());
catch (Exception e)
//打印到控制台
e.printStackTrace();
//日志记录
logger.error(TipsEnum.tsuggest_get_doc_fail.getMessage());
//构建错误返回信息
rData.setResultEnum(ResultEnum.error);
return rData;
http://127.0.0.1:8888/v1/docs/tsuggest
indexName索引名称
suggestFileld:自
动
补
全
查
找
列
suggestValue:自
动
补
全
输
入
的关键字
返回
3 指标聚合与下钻分析
3.1 指标聚合与分类
什么是指标聚合(Metric)
聚合分析是数据库中重要的功能特性,完成对某个查询的数据集中数据的聚合计算,
如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。
ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。
对一个数据集求最大值、最小值,计算和、平均值等指标的聚合,在ES中称为指标聚合。
Metric聚合分析分为单值分析和多值分析两类
1、单值分析,只输出一个分析结果
min,max,avg,sum,cardinality(
cardinality 求唯一值,即不重复的字段有多少(相当于mysql中的
distinct)
2、多值分析,输出多个分析结果
stats,extended_stats,percentile,percentile_rank
3.2 指标聚合与下钻设计
官网:https://www.elastic.co/guide/en/elasticsearch/reference/7.4/search-aggregations-metr
ics.html
语法:
"indexName": "product_completion_index",
"suggestFileld": "name",
"suggestValue": "阿迪达斯 耐克 外套 运动鞋 袜子"
"code": "200",
"desc": "操作成功!",
"data": "阿迪达斯外套"
openAPI设计目标与原则:
1、DSL调用与语法进行高度抽象,参数动态设计
2、Open API通过结果转换器支持上百种组合调用
qurey,constant_score,match/matchall/fifilter/sort/size/frm/higthlight/_source/includes
3、逻辑处理公共调用,提升API业务处理能力
4、保留原生API与参数的用法
3.2.1 基础框架搭建
tips:confifig class
3.2.2 单值分析API设计
1、Avg(平均值)
从聚合文档中提取的价格的平均值。
对所有文档进行avg聚合(DSL)
"aggregations" :
"<aggregation_name>" : <!--聚合的名字 -->
"<aggregation_type>" : <!--聚合的类型 -->
<aggregation_body> <!--聚合体:对哪些字
段进行聚合 -->
[,"meta" : [<meta_data_body>] ]? <!--元 -->
[,"aggregations" : [<sub_aggregation>]+ ]? <!--在聚合里面在定义子聚合
-->
[,"<aggregation_name_2>" : ... ]* <!--聚合的名字 -->
以上汇总计算了所有文档的平均值。
"size": 0, 表
示只查询文档聚合数量,不查文档,如查询50,size=50
aggs:表示是一个聚合
czbk:可自定义,聚合后的数据将显示在自定义字段中
OpenAPI查询参数设计
对筛选后的文档聚合
POST product_list_info/_search
"size": 0,
"aggs":
"czbk":
"avg":
"field": "price"
"indexName": "product_list_info",
"map":
"size": 0,
"aggs":
"czbk":
"avg":
"field": "price"
POST product_list_info/_search
"size": 0,
"query":
"term":
"onelevel": "手机通讯"
,
"aggs":
"czbk":
"avg":
"field": "price"
OpenAPI查询参数设计
"indexName": "product_list_info",
"map":
"size": 0,
"query":
"term":
"onelevel": "手机通讯"
,
"aggs":
"czbk":
"avg":
"field": "price"
根据Script计算平均值:
es所使用的脚本语言是painless这是一门安全-高效的脚本语言,基于jvm的
#统计所有
POST product_list_info/_search?size=0
"aggs":
"czbk":
"avg":
"script":
"source": "doc.evalcount.value"OpenAPI查询参数设计
总结:
avg平均
1、统一avg(所有文档)
2、有条件avg(部分文档)
结果:"value" : 599929.2282791147
"source": "doc[\'evalcount\']"
"source": "doc.evalcount"
#有条件
POST prod
uct_list_info/_search?size=0
"query
"
:
"ter
m
"
:
"onelevel": "
手机通讯"
,
"aggs":
"czbk":
"avg":
"script":
"source": "doc.evalcount"
结果:"value" : 600055.6935087288
"indexName": "product_list_info",
"map":
"size": 0,
"aggs":
"czbk":
"avg":
"script":
"source": "doc.evalcount"
3、脚本统计(所有)
4、脚本统计(部分)
代码编写
访问验证
2、Max(最大值)
计算从聚合文档中提取的数值的最大值。
统计所有文档
结果: "value" : 9.9999999E7
OpenAPI查询参数设计
统计过滤后的文档
if (m.getValue() instanceof ParsedAvg)
m
ap.put("value", ((ParsedAvg) m.getValue()).getValue());
http://1
72.17.0.225:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg
POST product_list_info/_search
"size": 0,
"aggs":
"czbk":
"max":
"field": "price"
"indexName": "product_list_info",
"map":
"size": 0,
"aggs":
"czbk":
"max":
"field": "price"
结果: "value" : 2474000.0
OpenAPI查询参数设计
结果: "value" : 2474000.0
代码编写
访问验证
POST product_list_info/_search
"size": 0,
"query":
"term":
"onelevel": "手机通讯"
,
"aggs":
"czbk":
"max":
"f
ield": "price"
"indexName": "product_list_info",
"map":
"size": 0,
"query":
"term":
"onelevel": "手机通讯"
,
"aggs":
"czbk":
"max":
"field": "price"
//最大值
if (m.getValue() instanceof ParsedMax)
map.put("value", ((ParsedMax) m.getValue()).getValue());
http://172.17.0.225:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg
3、Min(最小值)
计算从聚合文档中提取的数值的最小值。
统计所有文档
结果:"value": 0.0
OpenAPI查询参数设计
统计筛选后的文档
POST product_list_info/_search
"size": 0,
"aggs":
"czbk":
"min":
"f
ield": "price"
"indexName": "product_list_info",
"map":
"size": 0,
"aggs":
"czbk":
"min":
"field": "price"
POST product_list_info/_search
"size": 1,
"query":
"term":
"onelevel": "手机通讯"
,
"aggs":
"czbk":
"min":
"field": "price"
结果:"value": 0.0
参数size=1;可查询出金额为0的数据
OpenAPI查询参数设计
代码编写
访问验证
4、Sum(总和)
统计所有文档汇总
"indexName": "product_list_info",
"map":
"size
"
:
1
,
"quer
y
"
:
"
t
e
rm
"
:
"o
n
el
evel": "手机通讯"
,
"a
ggs":
"czbk":
"min":
"field": "price"
//最小值
if (m.getValue() instanceof ParsedMin)
map.put("value", ((ParsedMin) m.getValue()).getValue());
http://172.17.0.225:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg
POST product_list_info/_search
"size": 0,
"query":
"constant_score":
"filter":
"match":
"threelevel": "手机"
结果:"valu
e" : 3.433611809E7
OpenAPI查询参数设计
代码编写
访问验证
,
"aggs":
"czbk":
"sum":
"field": "price"
"indexName": "product_list_info",
"map":
"size": 0,
"query":
"constant_score":
"filter":
"match":
"threelevel": "手机"
,
"aggs":
"czbk":
"sum":
"field": "price"
//汇总
if (m.getValue() instanceof ParsedSum)
map.put("value", ((ParsedSum) m.getValue()).getValue());
http://172.17.0.225:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg
5、 Cardinality(唯一值)
Cardinality Aggre
g
at
ion
,
基
数
聚
合
。
它
属
于
m
u
lti-v
a
lu
e
,
基于
文
档
的
某
个
值
(可
以
是
特
定的字段,
也可以通过脚本计算
而
来
)
,
计
算
文
档
非
重
复
的
个
数
(
去
重
计
数
)
,相
当
于
s
q
l中的
di
st
in
ct。
cardinality 求
唯一值,即不重复的字段有多少(相当于mysql中的distinct)
统计所有文档
结果:"value" : 103169
OpenAPI查询参数设计
统计筛选后的文档
POST product_list_info/_search
"size": 0,
"aggs":
"czbk":
"cardinality":
"field": "storename"
"indexName": "product_list_info",
"map":
"size": 0,
"aggs":
"czbk":
"cardinality":
"field": "storename"
POST product_list_info/_search
"size": 0,
"query":
"constant_score":
"filter":
"match":
"threelevel": "手机"
OpenAPI查询参数设计
代码编写
访问验证
,
"aggs":
"czbk":
"cardinality":
"field": "storename"
"indexName": "product_list_info",
"map":
"size": 0,
"query":
"constant_score":
"filter":
"match":
"threelevel": "手机"
,
"aggs":
"czbk":
"cardinality":
"field": "storename"
if (m.getValue() instanceof ParsedCardinality)
map.put("value", ((ParsedCardinality) m.getValue()).getValue());
http://172.17.0.225:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg
3.2.3 多值分析API设计
1、Stats Aggregation
Stats Aggregation,统计聚合。它属于multi-value,基于文档的某个值(可以是特定的数值型字段,
也可以通过脚本计算而来),计算出一些统计信息(min、max、sum、count、avg5个值)
统计所有文档
OpenAPI查询参数设计
统计筛选文档
POST prod
uct_list_info/_search
"size"
:
0
,
"aggs"
:
"czbk":
"stats":
"field": "price"
返回
"aggregations" :
"czbk" :
"count" : 5072447,
"min" : 0.0,
"max" : 9.9999999E7,
"avg" : 920.1537270512633,
"sum" : 4.66743101232E9
"indexName": "product_list_info",
"map":
"size": 0,
"aggs":
"czbk":
"stats":
"field": "price"
POST product_list_info/_search
"size": 0,
OpenAPI查询参数设计
代码编写
"query":
"constant_score":
"filter":
"match":
"threelevel": "手机"
,
"aggs":
"czbk":
"sta
t
s
"
:
"f
ie
l
d
"
: "price"
"indexName": "product_list_info",
"map":
"size": 0,
"query":
"constant_score":
"filter":
"match":
"threelevel": "手机"
,
"aggs":
"czbk":
"stats":
"field": "price"
if (m.getValue() instanceof ParsedStats)
map.put("count", ((ParsedStats) m.getValue()).getCount());
map.put("min", ((ParsedStats) m.getValue()).getMin());
map.put("max", ((ParsedStats) m.getValue()).getMax());
map.put("avg", ((ParsedStats) m.getValue()).getAvg());
map.put("sum", ((ParsedStats) m.getValue()).getSum());
访问验证
2、扩展状态统计
Extended Stats A
ggregation,扩展统计聚合。它属于multi-value,比stats多4个统计结果: 平方
和、方差、标
准差、平均值加/减两个标准差的区间
统计所有文档
返回
sum_of_squares:平方和
variance:方差
std_deviation:标准差
std_deviation_bounds:标准差的区间
OpenAPI查询参数设计
http://172.17.0.225:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg
POST product_list_info/_search
"size": 0,
"aggs":
"czbk":
"extended_stats":
"field": "price"
aggregations" :
"czbk" :
"count" : 5072447,
"min" : 0.0,
"max" : 9.9999999E7,
"avg" : 920.1537270512633,
"sum" : 4.66743101232E9,
"sum_of_squares" : 2.0182209054045464E16,
"variance" : 3.9779448262354884E9,
"std_deviation" : 63070.950731977144,
"std_deviation_bounds" :
"upper" : 127062.05519100555,
"lower" : -125221.74773690302
统计筛选后的
文档
sum_of_squares:平方和
variance:方差
std_deviation:标准差
"indexName": "product_list_info",
"map":
"size": 0,
"aggs":
"czbk":
"extended_stats":
"field": "price"
POST product_list_info/_search
"size": 1,
"query":
"constant_score":
"filter":
"match":
"threelevel": "手机"
,
"aggs":
"czbk":
"extended_stats":
"field": "price"
aggregations" :
"czbk" :
"count" : 12402,
"min" : 0.0,
"max" : 2474000.0,
"avg" : 2768.595233833253,
"sum" : 3.433611809E7,
"sum_of_squares" : 6.445447222627729E12,
"variance" : 5.120451870452684E8,
"std_deviation" : 22628.41547800615,
"std_deviation_bounds" :
"upper" : 48025.42618984555,
"lower" : -42488.23572217905
std_deviation_bounds:标准差的区间
OpenAPI查询参数设
1、确定你安装了node;
2、在终端输入如下命令,安装插件;
3、插件安装完成后,在js文件同级建立文件jsconfig.json空文件
4、js文件就可以自动补全啦。
附:
我安装完的目录
自动补全的样子:
vscode 安装 three.js 插件,实现自动补全功能
参考技术A 使用vscode编辑工具,安装three自动补全插件。1、确定你安装了node;
2、在终端输入如下命令,安装插件;
3、插件安装完成后,在js文件同级建立文件jsconfig.json空文件
4、js文件就可以自动补全啦。
附:
我安装完的目录
自动补全的样子:
以上是关于自动补全实现的主要内容,如果未能解决你的问题,请参考以下文章
怎么用delphi实现一个代码编辑器,它带有代码自动补全功能。自动补全怎么实现????