elasticsearch-搜索-parent/child

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了elasticsearch-搜索-parent/child相关的知识,希望对你有一定的参考价值。

参考技术A olap_patient

olap_visit

注意索引字段_parent#olap_patient:[32]

总的来说,ES构建了一个DocValues字段,不管是父文档还是子文档,都将父文档的id存在了该DocValues字段里,这样做可以在查询得到parent结果集后将parent的doc_values取出来,作为child查询结果集的一个过滤条件

为什么要基于doc_values去实现呢?因为doc_values内部维护了两个对象,一个是ordinals,一个是values;ordinals可以查找doc_id对应的序数,values可以查找序数对应的字段值;也就是说,doc_values可以将文档集对应的值集用bitset来表示,方便过滤及聚合

增加一个doc_type时不能指定已存在的非parent的doc_type为_parent,因为ES会为父子文档都创建joinField的DocValues索引。异常信息:"error":"root_cause":["type":"illegal_argument_exception","reason":"can't add a _parent field that points to an already existing type, that isn't already a parent"],"type":"illegal_argument_exception","reason":"can't add a _parent field that points to an already existing type, that isn't already a parent","status":400

springboot集成elasticsearch全文搜索高亮显示实践

springboot集成elasticsearch全文搜索高亮显示实践

本文案例,在英文文章索引下中搜索包含指定单词的文章,对包含指定单词的句子高亮显示。主要介绍在springboot中如何集成elasticsearch,以及常用api。

引入依赖

<parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.3.0.RELEASE</version>
        <relativePath/> 
</parent>
<dependencies>
		<dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
</dependencies>

配置连接

import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.elasticsearch.client.ClientConfiguration;
import org.springframework.data.elasticsearch.client.RestClients;
import org.springframework.data.elasticsearch.config.AbstractElasticsearchConfiguration;

/**
 * @author liu
 */
@Configuration
public class ElasticSearchClient extends AbstractElasticsearchConfiguration 
    @Override
    @Bean
    public RestHighLevelClient elasticsearchClient() 
        ClientConfiguration clientConfiguration = ClientConfiguration.builder()
                .connectedTo("127.0.0.1:9200").build();
        return RestClients.create(clientConfiguration).rest();
    


添加数据(添加索引的文档)

索引相当于是数据库,一个文档相当于是具体一条数据,表对应的概念是类型(type)但是在elasticsearch最新版以及取消type了。

添加index以及document的代码如下:

@Data
@Document(indexName = "article")//指定index,也可以指定type
public class BdArticle implements Serializable 

    private static final long serialVersionUID=1L;
	//指定在elasticsearch中的id
    @TableId(value = "id", type = IdType.AUTO)
    @Id
    private Long id;
    //指定在elasticsearch中字段类型,还可以设置sort
    @Field(type = FieldType.Text)
    private String title;
    @Field(type = FieldType.Text)
    private String photo;
    @Field(type = FieldType.Text)
    private String context;
    @Field(type = FieldType.Text)
    private Long wordCount;
    @Field(type = FieldType.Text)
    private Long createTime;
    @Field(type = FieldType.Text)
    private Long updateTime;


把字段排除的注解也有,在实体类影时可以操作的地方很多,大家可以补充。

@Service
public class ArticleServiceImpl implements ArticleService 

    @Autowired
    private ElasticsearchOperations elasticsearchOperations;

    @Override
    @Transactional(rollbackFor = Exception.class)
    public void addBdArticle(BdArticle bdArticle) 
        elasticsearchOperations.save(bdArticle);
    

除了映射实体类也可以通过api加入单独的字段。

查询数据

@SpringBootTest(classes = ReciteWordsApplication.class)
@RunWith(SpringJUnit4ClassRunner.class)
public class ArticleELKTest 
    @Autowired
    private BdArticleService bdArticleService;
    @Autowired
    private ElasticsearchOperations elasticsearchOperations;
    @Autowired
    private RestHighLevelClient restHighLevelClient;


    @Test
    public void searchWordInArticles() throws IOException 
        String wordText = "is";
        // 拿到要查询的索引
        SearchRequest searchRequest = new SearchRequest("article");

        // 构建查询条件
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
		//指定返回字段
        String[] fields = "id","title","context";
        sourceBuilder.fetchSource(fields, Strings.EMPTY_ARRAY);
       //match查询,可选 sourceBuilder.query(QueryBuilders.matchQuery("context",wordText));
		//高亮设置
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        //设置高亮渲染
        String preTag = "<span style='color:red;font-weight:bold'>";
        String postTag = "</span>";
        HighlightBuilder.Field highlightContext = new HighlightBuilder.Field("context")
                .numOfFragments(1)//没一个document中返回的条数(因为一个document中就可以有好多个关键词)
                .preTags(preTag)
                .postTags(postTag);
        highlightBuilder.field(highlightContext);
		//设置返回document条数
		sourceBuilder.size(3);
        sourceBuilder.highlighter(highlightBuilder);
        searchRequest.source(sourceBuilder);
        // 进行查询
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        //对返回值进行处理
        org.elasticsearch.search.SearchHit[] hits = searchResponse.getHits().getHits();
        if(hits==null||hits.length==0)
            return null;
        
        List<BdArticle> list = new ArrayList<>();
        for (SearchHit hit : hits) 
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            Map<String, Object> sourceAsMap = hit.getSourceAsMap();
            BdArticle bdArticle = new BdArticle();
            bdArticle.setId(Long.parseLong(String.valueOf(sourceAsMap.get("id"))));
            bdArticle.setTitle(String.valueOf(sourceAsMap.get("title")));
            bdArticle.setContext(highlightFields.get("context").getFragments()[0].string());
            list.add(bdArticle);
        
        System.out.println(list);
    


对应的DSL查询语句以及查询结果:


    "query":
        "match":
            "context":"is"
        
    ,
    "_source":["id"],
    "highlight":
        "fields":
            "context":
                "number_of_fragments":1
            
        
    

res:重点看hits


    "took": 20,
    "timed_out": false,
    "_shards": 
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    ,
    "hits": 
        "total": 
            "value": 6,
            "relation": "eq"
        ,
        "max_score": 0.14691809,
        "hits": [
            
                "_index": "article",
                "_type": "_doc",
                "_id": "8",
                "_score": 0.14691809,
                "_source": 
                    "id": 8
                ,
                "highlight": 
                    "context": [
                        "Classified advertising <em>is</em> that advertising which <em>is</em> grouped in certain sections of the paper and <em>is</em> thus"
                    ]
                
            ,
            
                "_index": "article",
                "_type": "_doc",
                "_id": "3",
                "_score": 0.14463511,
                "_source": 
                    "id": 3
                ,
                "highlight": 
                    "context": [
                        "There <em>is</em> considerable sentiment about the “corruption” of women’s language—which of course <em>is</em> viewed"
                    ]
                
            ,
            
                "_index": "article",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.1358716,
                "_source": 
                    "id": 2
                ,
                "highlight": 
                    "context": [
                        "Obviously it <em>is</em> not of ours.”"
                    ]
                
            ,
            
                "_index": "article",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.12783799,
                "_source": 
                    "id": 1
                ,
                "highlight": 
                    "context": [
                        "Today it <em>is</em> a giant advertising company, worth $100 billion."
                    ]
                
            ,
            
                "_index": "article",
                "_type": "_doc",
                "_id": "4",
                "_score": 0.11044833,
                "_source": 
                    "id": 4
                ,
                "highlight": 
                    "context": [
                        "\\\\n The challenge <em>is</em> particularly evident in the work-place."
                    ]
                
            ,
            
                "_index": "article",
                "_type": "_doc",
                "_id": "5",
                "_score": 0.10031616,
                "_source": 
                    "id": 5
                ,
                "highlight": 
                    "context": [
                        "Like most people, I’ve long understood that I will be judged by my occupation, that my profession <em>is</em>"
                    ]
                
            
        ]
    

参考

五.全文检索ElasticSearch经典入门-ElasticSearch Java实战

可以看一下视频或者找api文档,springboot对elasticsearch整合的类和注解还是比较多的,包括实体类映射等,elasticsearch的概念也非常多,到了集群操作更加复制。

以上是关于elasticsearch-搜索-parent/child的主要内容,如果未能解决你的问题,请参考以下文章

springboot集成elasticsearch全文搜索高亮显示实践

springboot集成elasticsearch全文搜索高亮显示实践

Elasticsearch父子聚合 can‘t specify parent if no parent field has been configured

理解elasticsearch的parent-child关系

Elasticsearch 基于时间的提要模块的最佳方法?

Elasticsearch Breaker CircuitBreakingException Parent Data Too Large Real Usage