更新 Elasticsearch _mapping 中的字符串参数

Posted

技术标签:

【中文标题】更新 Elasticsearch _mapping 中的字符串参数【英文标题】:Update a string parameter in Elasticsearch _mapping 【发布时间】:2021-02-23 19:41:13 【问题描述】:

我在Elasticsearch6.8中有这样一个_mapping


  "grch38_test__wes__grch38__variants__20210222" : 
    "mappings" : 
      "variant" : 
        "_meta" : 
          "gencodeVersion" : "25",
          "hail_version" : "0.2.20",
          "genomeVersion" : "38",
          "sampleType" : "WES",
          "sourceFilePath" : "s3://my_folder/my_vcf.vcf"
        ,
    ...

我的目标是在Kibana 中发出查询以修改variant._meta.sourceFilePath。跟帖:

Elastic search mapping for nested json objects

我能够提出查询:

PUT /grch38_test__wes__grch38__variants__20210222/_mapping/variant

  "properties": 
    "variant": 
      "type": "nested",
      "properties": 
        "_meta": 
          "type": "nested",
          "properties": 
            "type": "text",
            "sourceFilePath": "s3://my_folder/my_vcf.vcf"
          
        
      
    
  

但它给了我一个错误:

elasticsearch mapping Expected map for property [fields] on field [name] but got a class java.lang.String

完整的错误信息:


  "error": 
    "root_cause": [
      
        "type": "mapper_parsing_exception",
        "reason": "Expected map for property [fields] on field [type] but got a class java.lang.String"
      
    ],
    "type": "mapper_parsing_exception",
    "reason": "Expected map for property [fields] on field [type] but got a class java.lang.String"
  ,
  "status": 400

我也试过了:

PUT /grch38_test__wes__grch38__variants__20210222/_mapping/variant

  "properties": 
    "variant": 
      "type": "nested",
      "properties": 
        "_meta": 
          "type": "nested",
          "properties": 
            "sourceFilePath": 
              "type": "text",
              "value":"s3://my_folder/my_vcf.vcf"
            
          
        
      
    
  

但它告诉我 value 不受支持:


  "error": 
    "root_cause": [
      
        "type": "mapper_parsing_exception",
        "reason": "Mapping definition for [sourceFilePath] has unsupported parameters:  [value : s3://seqr-dp-data--prod/vcf/dev/grch38_test_contracted.vcf]"
      
    ],
    "type": "mapper_parsing_exception",
    "reason": "Mapping definition for [sourceFilePath] has unsupported parameters:  [value : s3://seqr-dp-data--prod/vcf/dev/grch38_test_contracted.vcf]"
  ,
  "status": 400

我做错了什么?如何修改字段?

【问题讨论】:

【参考方案1】:

_meta 是storing application-specific metadata 的保留字段。它不是可搜索的,只能通过GET Mapping API 检索。

这意味着,如果您的 _meta 内容旨在与 _meta 字段的设计用途一致,则您不能对其应用任何映射。它是具体值的“最终”哈希图,需要在更新映射负载的顶层定义:

PUT /grch38_test__wes__grch38__variants__20210222/_mapping/variant

  "_meta": 
    "variant":             <-- shared index-level metadata
      "gencodeVersion": "25",
      "hail_version": "0.2.20",
      "genomeVersion": "38",
      "sampleType": "WES",
      "sourceFilePath": "s3://my_folder/my_vcf.vcf"
    
  ,
  "properties": 
    "some_text_field":     <-- actual document properties
      "type": "text" 
    
  

另一方面,如果您的 _meta 字段是一个不幸的命名巧合,您可以像这样声明它的映射:

PUT /grch38_test__wes__grch38__variants__20210222/_mapping/variant

  "properties": 
    "_meta": 
      "properties": 
        "variant": 
          "properties": 
            "gencodeVersion": 
              "type": "text"
            ,
            "genomeVersion": 
              "type": "text"
            ,
            "hail_version": 
              "type": "text"
            ,
            "sampleType": 
              "type": "text"
            ,
            "sourceFilePath": 
              "type": "text"
            
          
        
      
    
  

并摄取表单的文档:

POST grch38_test__wes__grch38__variants__20210222/variant/_doc

  "_meta": 
    "variant": 
      "gencodeVersion": "25",
      "hail_version": "0.2.20",
      "genomeVersion": "38",
      "sampleType": "WES",
      "sourceFilePath": "s3://my_folder/my_vcf.vcf"
    
  

但同样,_meta 内容将是文档特定的,而不是索引范围的!

顺便说一句,nested 映射仅在您处理arrays of objects 时才有意义,而不是对象的对象。 但如果你坚持想要它,你会这样做:

PUT /grch38_test__wes__grch38__variants__20210222/_mapping/variant?include_type_name

  "properties": 
    "_meta": 
      "type": "nested",            <---
      "properties": 
        "variant": 
          "type": "nested",        <---
          "properties": 
            "gencodeVersion": 
              "type": "text"
            ,
            "genomeVersion": 
              "type": "text"
            ,
            "hail_version": 
              "type": "text"
            ,
            "sampleType": 
              "type": "text"
            ,
            "sourceFilePath": 
              "type": "text"
            
          
        
      
    
  

【讨论】:

那么,这是否意味着(第一种情况,_meta 对应于它应该是什么)sourceFilePath 是在创建索引时定义的,并且根本不允许对其进行修改并且存在没有办法吗? 否 -- 您可以使用我的第一个 sn-p 修改共享的 _meta 属性 -- 只需确保删除 properties 部分。

以上是关于更新 Elasticsearch _mapping 中的字符串参数的主要内容,如果未能解决你的问题,请参考以下文章

Elasticsearch常用的相关操作汇总

使用 json 的 Elasticsearch 更新映射

ElasticSearch Mapping映射入门

ElasticSearch实战(十四)-Mappings 高级属性

ElasticSearch03_Mapping字段映射常用类型数据迁移ik分词器自定义分词器

ElasticSearch03_Mapping字段映射常用类型数据迁移ik分词器自定义分词器