Elasticsearch鐨勫垎鏋愯繃绋?鍐呯疆瀛楃杩囨护鍣ㄣ€佸垎鏋愬櫒銆佸垎璇嶅櫒銆佸垎璇嶈繃婊ゅ櫒锛堢湡鏄彉鎬佸鍟婏紒缇庢粙婊嬶級

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Elasticsearch鐨勫垎鏋愯繃绋?鍐呯疆瀛楃杩囨护鍣ㄣ€佸垎鏋愬櫒銆佸垎璇嶅櫒銆佸垎璇嶈繃婊ゅ櫒锛堢湡鏄彉鎬佸鍟婏紒缇庢粙婊嬶級相关的知识,希望对你有一定的参考价值。

鏍囩锛?a href='http://www.mamicode.com/so/1/analyzer' title='analyzer'>analyzer   mapping   inf   att   email   onclick   gif   div   

鍒嗘瀽杩囩▼

褰撴暟鎹鍙戦€佸埌elasticsearch鍚庡苟鍔犲叆鍊掓帓搴忕储寮曚箣鍓嶏紝elasticsearch浼氬鏂囨。杩涜澶勭悊锛氥€€銆€銆€

  • 瀛楃杩囨护锛氫娇鐢ㄥ瓧绗﹁繃婊ゅ櫒杞彉瀛楃銆?/li>
  • 鏂囨湰鍒囧垎涓哄垎璇嶏細灏嗘枃鏈紙妗o級鍒嗕负鍗曚釜鎴栧涓垎璇嶃€?/li>
  • 鍒嗚瘝杩囨护锛氫娇鐢ㄥ垎璇嶈繃婊ゅ櫒杞彉姣忎釜鍒嗚瘝銆?/li>
  • 鍒嗚瘝绱㈠紩锛氭渶缁堝皢鍒嗚瘝瀛樺偍鍦↙ucene鍊掓帓绱㈠紩涓€?/li>

鏁翠綋娴佺▼锛?/p>

鎶€鏈浘鐗? src=

鐩殑鏄揪鍒颁汉鎬у寲鐨勫垎璇?/strong>

鍐呯疆瀛楃杩囨护鍣?/h3>

鎶€鏈浘鐗? src=

html瀛楃杩囨护鍣ㄣ€佹槧灏勫瓧绗﹁繃婊ゅ櫒銆佹ā寮忔浛鎹㈣繃婊ゅ櫒

HTML瀛楃杩囨护鍣?nbsp;

POST _analyze

  "tokenizer":      "keyword", 
  "char_filter":  [ "html_strip" ],
  "text": "<p>I&apos;m so <b>happy</b>!</p>"

 

 缁撴灉


  "tokens" : [
    
      "token" : """

I鈥榤 so happy!

""",
      "start_offset" : 0,
      "end_offset" : 32,
      "type" : "word",
      "position" : 0
    
  ]

 

鑷畾涔塇TML杩囨护鍣?/p>

PUT my_index

  "settings": 
    "analysis": 
      "analyzer": 
        "my_analyzer": 
          "tokenizer": "keyword",
          "char_filter": ["my_char_filter"]
        
      ,
      "char_filter": 
        "my_char_filter": 
          "type": "html_strip",
          "escaped_tags": ["b"]
        
      
    
  

 

鏄犲皠瀛楃杩囨护

PUT my_index

  "settings": 
    "analysis": 
      "analyzer": 
        "my_analyzer":
          "tokenizer":"keyword",
          "char_filter":["my_char_filter"]
        
      ,
      "char_filter":
          "my_char_filter":
            "type":"mapping",
            "mappings":["鑻嶄簳绌?=> 666","姝﹁棨鍏?=> 888"]
          
        
    
  


GET my_index/_analyze

  "analyzer": "my_analyzer",
  "text":"鑻嶄簳绌虹儹鐖辨钘ゅ叞锛屽彲鎯滆媿浜曠┖鍚庢潵缁撳浜?/span>"

 

缁撴灉

鎶€鏈浘鐗? id=
 1 
 2   "tokens" : [
 3     
 4       "token" : "666鐑埍888锛屽彲鎯?66鍚庢潵缁撳浜?/span>",
 5       "start_offset" : 0,
 6       "end_offset" : 19,
 7       "type" : "word",
 8       "position" : 0
 9     
10   ]
11 
1111111

 

 妯″紡鏇挎崲杩囨护鍣?/p>

PUT my_index1

  "settings": 
    "analysis": 
      "analyzer": 
        "my_analyzer": 
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        
      ,
      "char_filter": 
        "my_char_filter": 
          "type": "pattern_replace",
          "pattern": "(\\d+)-(?=\\d)",
          "replacement": "$1_"
        
      
    
  


POST my_index1/_analyze

  "analyzer": "my_analyzer",
  "text": "My credit card is 123-456-789"

 

缁撴灉

鎶€鏈浘鐗? id=

  "tokens" : [
    
      "token" : "My",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    ,
    
      "token" : "credit",
      "start_offset" : 3,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    ,
    
      "token" : "card",
      "start_offset" : 10,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 2
    ,
    
      "token" : "is",
      "start_offset" : 15,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 3
    ,
    
      "token" : "123_456_789",
      "start_offset" : 18,
      "end_offset" : 29,
      "type" : "<NUM>",
      "position" : 4
    
  ]
1111111

 

 鍐呯疆鍒嗘瀽鍣?/h3>

鎶€鏈浘鐗? src=

鍐呯疆鍒嗚瘝鍣?/h3>

鎶€鏈浘鐗? src=

UAX URL鐢靛瓙閭欢鍒嗚瘝

1 浣滆€咃細涓€绾跨爜鍐?
2 鏉ユ簮锛氭湭鐭ュ師鏂囷細https://www.cnblogs.com/Mc_HotHog/articles/1111111.html
3 閭锛?2222@qq.com
4 鐗堟潈澹版槑锛氭湰鏂囦负鍗氫富鍘熷垱鏂囩珷锛岃浆杞借闄勪笂鍗氭枃閾炬帴锛?/pre>

 

 

POST _analyze

  "tokenizer": "uax_url_email",
  "text":"浣滆€咃細涓€绾跨爜鍐滄潵婧愶細鏈煡鍘熸枃锛歨ttps://www.cnblogs.com/Mc_HotHog/articles/1111111.html閭锛?2222@qq.com鐗堟潈澹版槑锛氭湰鏂囦负鍗氫富鍘熷垱鏂囩珷锛岃浆杞借闄勪笂鍗氭枃閾炬帴锛?/span>"

 

 

缁撴灉

鎶€鏈浘鐗? id=

  "tokens" : [
    
      "token" : "浣?/span>",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    ,
    
      "token" : "鑰?/span>",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    ,
    
      "token" : "涓€",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    ,
    
      "token" : "绾?/span>",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    ,
    
      "token" : "鐮?/span>",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    ,
    
      "token" : "鍐?/span>",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    ,
    
      "token" : "鏉?/span>",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    ,
    
      "token" : "婧?/span>",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 7
    ,
    
      "token" : "鏈?/span>",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "<IDEOGRAPHIC>",
      "position" : 8
    ,
    
      "token" : "鐭?/span>",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "<IDEOGRAPHIC>",
      "position" : 9
    ,
    
      "token" : "鍘?/span>",
      "start_offset" : 12,
      "end_offset" : 13,
      "type" : "<IDEOGRAPHIC>",
      "position" : 10
    ,
    
      "token" : "鏂?/span>",
      "start_offset" : 13,
      "end_offset" : 14,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    ,
    
      "token" : "https://www.cnblogs.com/Mc_HotHog/articles/1111111.html",
      "start_offset" : 15,
      "end_offset" : 70,
      "type" : "<URL>",
      "position" : 12
    ,
    
      "token" : "閭?/span>",
      "start_offset" : 70,
      "end_offset" : 71,
      "type" : "<IDEOGRAPHIC>",
      "position" : 13
    ,
    
      "token" : "绠?/span>",
      "start_offset" : 71,
      "end_offset" : 72,
      "type" : "<IDEOGRAPHIC>",
      "position" : 14
    ,
    
      "token" : "22222@qq.com",
      "start_offset" : 73,
      "end_offset" : 85,
      "type" : "<EMAIL>",
      "position" : 15
    ,
    
      "token" : "鐗?/span>",
      "start_offset" : 85,
      "end_offset" : 86,
      "type" : "<IDEOGRAPHIC>",
      "position" : 16
    ,
    
      "token" : "鏉?/span>",
      "start_offset" : 86,
      "end_offset" : 87,
      "type" : "<IDEOGRAPHIC>",
      "position" : 17
    ,
    
      "token" : "澹?/span>",
      "start_offset" : 87,
      "end_offset" : 88,
      "type" : "<IDEOGRAPHIC>",
      "position" : 18
    ,
    
      "token" : "鏄?/span>",
      "start_offset" : 88,
      "end_offset" : 89,
      "type" : "<IDEOGRAPHIC>",
      "position" : 19
    ,
    
      "token" : "鏈?/span>",
      "start_offset" : 90,
      "end_offset" : 91,
      "type" : "<IDEOGRAPHIC>",
      "position" : 20
    ,
    
      "token" : "鏂?/span>",
      "start_offset" : 91,
      "end_offset" : 92,
      "type" : "<IDEOGRAPHIC>",
      "position" : 21
    ,
    
      "token" : "涓?/span>",
      "start_offset" : 92,
      "end_offset" : 93,
      "type" : "<IDEOGRAPHIC>",
      "position" : 22
    ,
    
      "token" : "鍗?/span>",
      "start_offset" : 93,
      "end_offset" : 94,
      "type" : "<IDEOGRAPHIC>",
      "position" : 23
    ,
    
      "token" : "涓?/span>",
      "start_offset" : 94,
      "end_offset" : 95,
      "type" : "<IDEOGRAPHIC>",
      "position" : 24
    ,
    
      "token" : "鍘?/span>",
      "start_offset" : 95,
      "end_offset" : 96,
      "type" : "<IDEOGRAPHIC>",
      "position" : 25
    ,
    
      "token" : "鍒?/span>",
      "start_offset" : 96,
      "end_offset" : 97,
      "type" : "<IDEOGRAPHIC>",
      "position" : 26
    ,
    
      "token" : "鏂?/span>",
      "start_offset" : 97,
      "end_offset" : 98,
      "type" : "<IDEOGRAPHIC>",
      "position" : 27
    ,
    
      "token" : "绔?/span>",
      "start_offset" : 98,
      "end_offset" : 99,
      "type" : "<IDEOGRAPHIC>",
      "position" : 28
    ,
    
      "token" : "杞?/span>",
      "start_offset" : 100,
      "end_offset" : 101,
      "type" : "<IDEOGRAPHIC>",
      "position" : 29
    ,
    
      "token" : "杞?/span>",
      "start_offset" : 101,
      "end_offset" : 102,
      "type" : "<IDEOGRAPHIC>",
      "position" : 30
    ,
    
      "token" : "璇?/span>",
      "start_offset" : 102,
      "end_offset" : 103,
      "type" : "<IDEOGRAPHIC>",
      "position" : 31
    ,
    
      "token" : "闄?/span>",
      "start_offset" : 103,
      "end_offset" : 104,
      "type" : "<IDEOGRAPHIC>",
      "position" : 32
    ,
    
      "token" : "涓?/span>",
      "start_offset" : 104,
      "end_offset" : 105,
      "type" : "<IDEOGRAPHIC>",
      "position" : 33
    ,
    
      "token" : "鍗?/span>",
      "start_offset" : 105,
      "end_offset" : 106,
      "type" : "<IDEOGRAPHIC>",
      "position" : 34
    ,
    
      "token" : "鏂?/span>",
      "start_offset" : 106,
      "end_offset" : 107,
      "type" : "<IDEOGRAPHIC>",
      "position" : 35
    ,
    
      "token" : "閾?/span>",
      "start_offset" : 107,
      "end_offset" : 108,
      "type" : "<IDEOGRAPHIC>",
      "position" : 36
    ,
    
      "token" : "鎺?/span>",
      "start_offset" : 108,
      "end_offset" : 109,
      "type" : "<IDEOGRAPHIC>",
      "position" : 37
    
  ]
11111

 

 鍐呯疆鍒嗚瘝杩囨护鍣?/h3>

鎶€鏈浘鐗? src=

浜嗚В鏇村https://www.elastic.co/guide/en/elasticsearch/reference/6.5/index.html

 

以上是关于Elasticsearch鐨勫垎鏋愯繃绋?鍐呯疆瀛楃杩囨护鍣ㄣ€佸垎鏋愬櫒銆佸垎璇嶅櫒銆佸垎璇嶈繃婊ゅ櫒锛堢湡鏄彉鎬佸鍟婏紒缇庢粙婊嬶級的主要内容,如果未能解决你的问题,请参考以下文章

Light Weight CNN妯″瀷鐨勫垎鏋愪笌鎬荤粨

python鍐呯疆鏁版嵁缁撴瀯

鍐呯疆鍑芥暟锛歨elp锛宑allable锛宒ir

Python瀛︿範绗簲鍛ㄤ箣鍐呯疆妯″潡涔媡ime涓巇atetime

銆愬皬鐧藉涔燙++ 鏁欑▼銆戝叚銆丆++鍐呯疆鍑芥暟鍜屽嚱鏁颁紶鍙?/h1>

UE4鎻掍欢