弹性搜索:索引具有空值的日期字段

Posted

技术标签:

【中文标题】弹性搜索:索引具有空值的日期字段【英文标题】:Elastic Search: indexing dates field that has null values 【发布时间】:2018-10-29 06:39:53 【问题描述】:

我在 Python 中使用 Elasticsearch 客户端为以下字段创建索引,但我一直坚持创建具有空值的日期索引。 当数据中存在空值时,我很难理解为什么它没有设置为date 而不是string 的索引。 从在线和 ES 文档研究来看,您似乎无法对空值进行索引。 所以,我正在关注这个https://www.elastic.co/guide/en/elasticsearch/reference/current/null-value.html 文档来解决使用"null_value": "NULL" 的问题,但是我没有成功。

我尝试将实际日期日期更改为"yyyy-MM-dd", "MM/dd/yyyy" ...等格式以及许多其他组合。 对于 json 映射,我也尝试过 "type": "strict_date""type": "strict_date": "MM/dd/yyyy"。 有什么办法可以解决这个问题吗?

数据:

  id_name,team_name,team_members,date_info,date_sub
  123,"Biology, Neurobiology ","Ali Smith, Jon Doe",5/1/2015,5/1/2015
  234,Mathematics,Jane Smith ,8/12/2016,
  345,"Statistics, Probability","Matt P, Albert Shaw",5/15/2015,5/15/2015
  456,Chemistry,"Andrew M, Matt Shaw, Ali Smith",4/12/2017,
  678,Physics,"Joe Doe, Jane Smith, Ali Smith ",5/12/2017,5/12/2017

JSON/Python 映射:

request_body = '''
        
            "settings" : 
              "number_of_shards": 2,
              "number_of_replicas": 1
            ,

            "mappings": 
                "team": 
                    "properties": 
                        "id_name":  "type": "text",
                        "team_name":  "type": "text",
                        "team_members":  "type": "text",
                        "date_info": "type": "date","null_value": "NULL",
                        "date_sub": "type": "date","null_value":"NULL"
                        
                    
            
        
    '''

    res = self.es.indices.create(index=your_index_name, ignore = 400, body=request_body)

错误:

raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'mapper_parsing_exception', 'failed to parse [date_info]')

【问题讨论】:

您可以发布您的索引请求吗? 【参考方案1】:

在您的映射中,您没有为您的日期字段指定日期格式,在这种情况下,Elastic 将使用内置格式,如下 - "strict_date_optional_time||epoch_millis",这意味着,它应该是一个表示毫秒的长数字纪元的开头或strict_date_optional_time,实际上是一种strict格式

严格格式意味着,如果您有日期5/12/2017,则应将其填充到缺少的数字。在这种情况下,正确的严格日期应该是05/12/2017

有关日期格式的更多信息 - https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html#built-in-date-formats

【讨论】:

【参考方案2】:

首先,您的日期字段架构不得包含"null_value": "NULL"

我在 Kibana 中试过

PUT *** 
  "settings": 
    "number_of_shards": 2,
    "number_of_replicas": 1
  ,
  "mappings": 
    "team": 
      "properties": 
        "id_name": 
          "type": "text"
        ,
        "team_name": 
          "type": "text"
        ,
        "team_members": 
          "type": "text"
        ,
        "date_info": 
          "type": "date"
        ,
        "date_sub": 
          "type": "date"
        
      
    
  

然后,我尝试使用空日期信息插入数据

POST ***/team

  "id_name": 341,
  "team_name": "Gogologi",
  "team_members": "Wayern",
  "date_info": null,
  "date_sub": "2014-02-01"

为了验证,我执行了 GET 命令GET ***/team/_search


  "_index": "***",
  "_type": "team",
  "_id": "AWOCTEhoVu_LbUvfNt6J",
  "_score": 1,
  "_source": 
    "id_name": 341,
    "team_name": "Gogologi",
    "team_members": "Wayern",
    "date_info": null,
    "date_sub": "2014-02-01"
  

希望对你有帮助!

【讨论】:

【参考方案3】:

null_value 需要与字段具有相同的数据类型。 null_value | Elastic

我将null_value 设置为可以被指定的format 解析的值。

PUT my-index-000001

  "mappings": 
    "properties": 
      "date": 
        "type":   "date",
        "null_value": "01/01/0001", 
        "format": "dd/MM/yyyy"
      
    
  

然后,我们可以插入一些文档。

POST my-index-000001/_doc
 "date": null 
POST my-index-000001/_doc
 "date": "01/01/0001" 
POST my-index-000001/_doc
 "date": "31/10/2021" 

现在,我们可以搜索null_value

GET my-index-000001/_search

  "query": 
    "match": 
      "date": "01/01/0001"
    
  


### Response ###

  "took" : 0,
  "timed_out" : false,
  "_shards" : 
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  ,
  "hits" : 
    "total" : 
      "value" : 2,
      "relation" : "eq"
    ,
    "max_score" : 1.0,
    "hits" : [
      
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "rY203nwBSf_8E_MJ7pyJ",
        "_score" : 1.0,
        "_source" : 
          "date" : null
        
      ,
      
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "ro203nwBSf_8E_MJ9Jzy",
        "_score" : 1.0,
        "_source" : 
          "date" : "01/01/0001"
        
      
    ]
  

但请注意,null_value 仍然可以使用 range 查询进行搜索。

GET my-index-000001/_search

  "query": 
    "range": 
      "date": 
        "lt": "01/01/2021"
      
    
  


### Response ###

  "took" : 0,
  "timed_out" : false,
  "_shards" : 
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  ,
  "hits" : 
    "total" : 
      "value" : 2,
      "relation" : "eq"
    ,
    "max_score" : 1.0,
    "hits" : [
      
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "rY203nwBSf_8E_MJ7pyJ",
        "_score" : 1.0,
        "_source" : 
          "date" : null
        
      ,
      
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "ro203nwBSf_8E_MJ9Jzy",
        "_score" : 1.0,
        "_source" : 
          "date" : "01/01/0001"
        
      
    ]
  

【讨论】:

以上是关于弹性搜索:索引具有空值的日期字段的主要内容,如果未能解决你的问题,请参考以下文章

angularjs ng重复svg跳过空值的索引

Elasticsearch:过滤具有空geo_point值的文档

索引空值以在 DB2 上快速搜索

弹性重新索引日期格式太短

来自多个弹性搜索索引的 Kibana 可视化

弹性搜索如何索引嵌套列表