elasticsearch-搜索-parent/child

Posted 2023-04-04

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了elasticsearch-搜索-parent/child相关的知识，希望对你有一定的参考价值。

参考技术A olap_patient

olap_visit

注意索引字段_parent#olap_patient:[32]

总的来说，ES构建了一个DocValues字段，不管是父文档还是子文档，都将父文档的id存在了该DocValues字段里，这样做可以在查询得到parent结果集后将parent的doc_values取出来，作为child查询结果集的一个过滤条件

为什么要基于doc_values去实现呢？因为doc_values内部维护了两个对象，一个是ordinals，一个是values；ordinals可以查找doc_id对应的序数，values可以查找序数对应的字段值；也就是说，doc_values可以将文档集对应的值集用bitset来表示，方便过滤及聚合

增加一个doc_type时不能指定已存在的非parent的doc_type为_parent，因为ES会为父子文档都创建joinField的DocValues索引。异常信息："error":"root_cause":["type":"illegal_argument_exception","reason":"can't add a _parent field that points to an already existing type, that isn't already a parent"],"type":"illegal_argument_exception","reason":"can't add a _parent field that points to an already existing type, that isn't already a parent","status":400

springboot集成elasticsearch全文搜索高亮显示实践

本文案例，在英文文章索引下中搜索包含指定单词的文章，对包含指定单词的句子高亮显示。主要介绍在springboot中如何集成elasticsearch，以及常用api。

引入依赖

<parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.3.0.RELEASE</version>
        <relativePath/> 
</parent>
<dependencies>
		<dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
</dependencies>

配置连接

import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.elasticsearch.client.ClientConfiguration;
import org.springframework.data.elasticsearch.client.RestClients;
import org.springframework.data.elasticsearch.config.AbstractElasticsearchConfiguration;

/**
 * @author liu
 */
@Configuration
public class ElasticSearchClient extends AbstractElasticsearchConfiguration 
    @Override
    @Bean
    public RestHighLevelClient elasticsearchClient() 
        ClientConfiguration clientConfiguration = ClientConfiguration.builder()
                .connectedTo("127.0.0.1:9200").build();
        return RestClients.create(clientConfiguration).rest();

添加数据（添加索引的文档）

索引相当于是数据库，一个文档相当于是具体一条数据，表对应的概念是类型（type）但是在elasticsearch最新版以及取消type了。

添加index以及document的代码如下：

@Data
@Document(indexName = "article")//指定index，也可以指定type
public class BdArticle implements Serializable 

    private static final long serialVersionUID=1L;
	//指定在elasticsearch中的id
    @TableId(value = "id", type = IdType.AUTO)
    @Id
    private Long id;
    //指定在elasticsearch中字段类型，还可以设置sort
    @Field(type = FieldType.Text)
    private String title;
    @Field(type = FieldType.Text)
    private String photo;
    @Field(type = FieldType.Text)
    private String context;
    @Field(type = FieldType.Text)
    private Long wordCount;
    @Field(type = FieldType.Text)
    private Long createTime;
    @Field(type = FieldType.Text)
    private Long updateTime;

把字段排除的注解也有，在实体类影时可以操作的地方很多，大家可以补充。

@Service
public class ArticleServiceImpl implements ArticleService 

    @Autowired
    private ElasticsearchOperations elasticsearchOperations;

    @Override
    @Transactional(rollbackFor = Exception.class)
    public void addBdArticle(BdArticle bdArticle) 
        elasticsearchOperations.save(bdArticle);

除了映射实体类也可以通过api加入单独的字段。

查询数据

@SpringBootTest(classes = ReciteWordsApplication.class)
@RunWith(SpringJUnit4ClassRunner.class)
public class ArticleELKTest 
    @Autowired
    private BdArticleService bdArticleService;
    @Autowired
    private ElasticsearchOperations elasticsearchOperations;
    @Autowired
    private RestHighLevelClient restHighLevelClient;


    @Test
    public void searchWordInArticles() throws IOException 
        String wordText = "is";
        // 拿到要查询的索引
        SearchRequest searchRequest = new SearchRequest("article");

        // 构建查询条件
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
		//指定返回字段
        String[] fields = "id","title","context";
        sourceBuilder.fetchSource(fields, Strings.EMPTY_ARRAY);
       //match查询，可选 sourceBuilder.query(QueryBuilders.matchQuery("context",wordText));
		//高亮设置
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        //设置高亮渲染
        String preTag = "<span style='color:red;font-weight:bold'>";
        String postTag = "</span>";
        HighlightBuilder.Field highlightContext = new HighlightBuilder.Field("context")
                .numOfFragments(1)//没一个document中返回的条数（因为一个document中就可以有好多个关键词）
                .preTags(preTag)
                .postTags(postTag);
        highlightBuilder.field(highlightContext);
		//设置返回document条数
		sourceBuilder.size(3);
        sourceBuilder.highlighter(highlightBuilder);
        searchRequest.source(sourceBuilder);
        // 进行查询
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        //对返回值进行处理
        org.elasticsearch.search.SearchHit[] hits = searchResponse.getHits().getHits();
        if(hits==null||hits.length==0)
            return null;
        
        List<BdArticle> list = new ArrayList<>();
        for (SearchHit hit : hits) 
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            Map<String, Object> sourceAsMap = hit.getSourceAsMap();
            BdArticle bdArticle = new BdArticle();
            bdArticle.setId(Long.parseLong(String.valueOf(sourceAsMap.get("id"))));
            bdArticle.setTitle(String.valueOf(sourceAsMap.get("title")));
            bdArticle.setContext(highlightFields.get("context").getFragments()[0].string());
            list.add(bdArticle);
        
        System.out.println(list);

对应的DSL查询语句以及查询结果：


    "query":
        "match":
            "context":"is"
        
    ,
    "_source":["id"],
    "highlight":
        "fields":
            "context":
                "number_of_fragments":1

res：重点看hits


    "took": 20,
    "timed_out": false,
    "_shards": 
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    ,
    "hits": 
        "total": 
            "value": 6,
            "relation": "eq"
        ,
        "max_score": 0.14691809,
        "hits": [
            
                "_index": "article",
                "_type": "_doc",
                "_id": "8",
                "_score": 0.14691809,
                "_source": 
                    "id": 8
                ,
                "highlight": 
                    "context": [
                        "Classified advertising <em>is</em> that advertising which <em>is</em> grouped in certain sections of the paper and <em>is</em> thus"
                    ]
                
            ,
            
                "_index": "article",
                "_type": "_doc",
                "_id": "3",
                "_score": 0.14463511,
                "_source": 
                    "id": 3
                ,
                "highlight": 
                    "context": [
                        "There <em>is</em> considerable sentiment about the “corruption” of women’s language—which of course <em>is</em> viewed"
                    ]
                
            ,
            
                "_index": "article",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.1358716,
                "_source": 
                    "id": 2
                ,
                "highlight": 
                    "context": [
                        "Obviously it <em>is</em> not of ours.”"
                    ]
                
            ,
            
                "_index": "article",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.12783799,
                "_source": 
                    "id": 1
                ,
                "highlight": 
                    "context": [
                        "Today it <em>is</em> a giant advertising company, worth $100 billion."
                    ]
                
            ,
            
                "_index": "article",
                "_type": "_doc",
                "_id": "4",
                "_score": 0.11044833,
                "_source": 
                    "id": 4
                ,
                "highlight": 
                    "context": [
                        "\\\\n The challenge <em>is</em> particularly evident in the work-place."
                    ]
                
            ,
            
                "_index": "article",
                "_type": "_doc",
                "_id": "5",
                "_score": 0.10031616,
                "_source": 
                    "id": 5
                ,
                "highlight": 
                    "context": [
                        "Like most people, I’ve long understood that I will be judged by my occupation, that my profession <em>is</em>"
                    ]
                
            
        ]

参考

五.全文检索ElasticSearch经典入门-ElasticSearch Java实战

可以看一下视频或者找api文档，springboot对elasticsearch整合的类和注解还是比较多的，包括实体类映射等，elasticsearch的概念也非常多，到了集群操作更加复制。

以上是关于elasticsearch-搜索-parent/child的主要内容，如果未能解决你的问题，请参考以下文章

springboot集成elasticsearch全文搜索高亮显示实践

Elasticsearch父子聚合 can‘t specify parent if no parent field has been configured

理解elasticsearch的parent-child关系

Elasticsearch 基于时间的提要模块的最佳方法？

Elasticsearch Breaker CircuitBreakingException Parent Data Too Large Real Usage