Elasticsearch鐨勫垎鏋愯繃绋?鍐呯疆瀛楃杩囨护鍣ㄣ€佸垎鏋愬櫒銆佸垎璇嶅櫒銆佸垎璇嶈繃婊ゅ櫒锛堢湡鏄彉鎬佸鍟婏紒缇庢粙婊嬶級
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Elasticsearch鐨勫垎鏋愯繃绋?鍐呯疆瀛楃杩囨护鍣ㄣ€佸垎鏋愬櫒銆佸垎璇嶅櫒銆佸垎璇嶈繃婊ゅ櫒锛堢湡鏄彉鎬佸鍟婏紒缇庢粙婊嬶級相关的知识,希望对你有一定的参考价值。
鏍囩锛?a href='http://www.mamicode.com/so/1/analyzer' title='analyzer'>analyzer mapping inf att email onclick gif div
鍒嗘瀽杩囩▼
褰撴暟鎹鍙戦€佸埌elasticsearch鍚庡苟鍔犲叆鍊掓帓搴忕储寮曚箣鍓嶏紝elasticsearch浼氬鏂囨。杩涜澶勭悊锛氥€€銆€銆€
- 瀛楃杩囨护锛氫娇鐢ㄥ瓧绗﹁繃婊ゅ櫒杞彉瀛楃銆?/li>
- 鏂囨湰鍒囧垎涓哄垎璇嶏細灏嗘枃鏈紙妗o級鍒嗕负鍗曚釜鎴栧涓垎璇嶃€?/li>
- 鍒嗚瘝杩囨护锛氫娇鐢ㄥ垎璇嶈繃婊ゅ櫒杞彉姣忎釜鍒嗚瘝銆?/li>
- 鍒嗚瘝绱㈠紩锛氭渶缁堝皢鍒嗚瘝瀛樺偍鍦↙ucene鍊掓帓绱㈠紩涓€?/li>
鏁翠綋娴佺▼锛?/p>
鐩殑鏄揪鍒颁汉鎬у寲鐨勫垎璇?/strong>
鍐呯疆瀛楃杩囨护鍣?/h3>
html瀛楃杩囨护鍣ㄣ€佹槧灏勫瓧绗﹁繃婊ゅ櫒銆佹ā寮忔浛鎹㈣繃婊ゅ櫒
HTML瀛楃杩囨护鍣?nbsp;
POST _analyze "tokenizer": "keyword", "char_filter": [ "html_strip" ], "text": "<p>I'm so <b>happy</b>!</p>"
缁撴灉
"tokens" : [ "token" : """ I鈥榤 so happy! """, "start_offset" : 0, "end_offset" : 32, "type" : "word", "position" : 0 ]
鑷畾涔塇TML杩囨护鍣?/p>
PUT my_index "settings": "analysis": "analyzer": "my_analyzer": "tokenizer": "keyword", "char_filter": ["my_char_filter"] , "char_filter": "my_char_filter": "type": "html_strip", "escaped_tags": ["b"]
鏄犲皠瀛楃杩囨护
PUT my_index "settings": "analysis": "analyzer": "my_analyzer": "tokenizer":"keyword", "char_filter":["my_char_filter"] , "char_filter": "my_char_filter": "type":"mapping", "mappings":["鑻嶄簳绌?=> 666","姝﹁棨鍏?=> 888"] GET my_index/_analyze "analyzer": "my_analyzer", "text":"鑻嶄簳绌虹儹鐖辨钘ゅ叞锛屽彲鎯滆媿浜曠┖鍚庢潵缁撳浜?/span>"
缁撴灉
1 2 "tokens" : [ 3 4 "token" : "666鐑埍888锛屽彲鎯?66鍚庢潵缁撳浜?/span>", 5 "start_offset" : 0, 6 "end_offset" : 19, 7 "type" : "word", 8 "position" : 0 9 10 ] 11
妯″紡鏇挎崲杩囨护鍣?/p>
PUT my_index1 "settings": "analysis": "analyzer": "my_analyzer": "tokenizer": "standard", "char_filter": [ "my_char_filter" ] , "char_filter": "my_char_filter": "type": "pattern_replace", "pattern": "(\\d+)-(?=\\d)", "replacement": "$1_" POST my_index1/_analyze "analyzer": "my_analyzer", "text": "My credit card is 123-456-789"
缁撴灉
"tokens" : [ "token" : "My", "start_offset" : 0, "end_offset" : 2, "type" : "<ALPHANUM>", "position" : 0 , "token" : "credit", "start_offset" : 3, "end_offset" : 9, "type" : "<ALPHANUM>", "position" : 1 , "token" : "card", "start_offset" : 10, "end_offset" : 14, "type" : "<ALPHANUM>", "position" : 2 , "token" : "is", "start_offset" : 15, "end_offset" : 17, "type" : "<ALPHANUM>", "position" : 3 , "token" : "123_456_789", "start_offset" : 18, "end_offset" : 29, "type" : "<NUM>", "position" : 4 ]
鍐呯疆鍒嗘瀽鍣?/h3>
鍐呯疆鍒嗚瘝鍣?/h3>
UAX URL鐢靛瓙閭欢鍒嗚瘝
1 浣滆€咃細涓€绾跨爜鍐? 2 鏉ユ簮锛氭湭鐭ュ師鏂囷細https://www.cnblogs.com/Mc_HotHog/articles/1111111.html 3 閭锛?2222@qq.com 4 鐗堟潈澹版槑锛氭湰鏂囦负鍗氫富鍘熷垱鏂囩珷锛岃浆杞借闄勪笂鍗氭枃閾炬帴锛?/pre>
POST _analyze "tokenizer": "uax_url_email", "text":"浣滆€咃細涓€绾跨爜鍐滄潵婧愶細鏈煡鍘熸枃锛歨ttps://www.cnblogs.com/Mc_HotHog/articles/1111111.html閭锛?2222@qq.com鐗堟潈澹版槑锛氭湰鏂囦负鍗氫富鍘熷垱鏂囩珷锛岃浆杞借闄勪笂鍗氭枃閾炬帴锛?/span>"
缁撴灉
"tokens" : [ "token" : "浣?/span>", "start_offset" : 0, "end_offset" : 1, "type" : "<IDEOGRAPHIC>", "position" : 0 , "token" : "鑰?/span>", "start_offset" : 1, "end_offset" : 2, "type" : "<IDEOGRAPHIC>", "position" : 1 , "token" : "涓€", "start_offset" : 3, "end_offset" : 4, "type" : "<IDEOGRAPHIC>", "position" : 2 , "token" : "绾?/span>", "start_offset" : 4, "end_offset" : 5, "type" : "<IDEOGRAPHIC>", "position" : 3 , "token" : "鐮?/span>", "start_offset" : 5, "end_offset" : 6, "type" : "<IDEOGRAPHIC>", "position" : 4 , "token" : "鍐?/span>", "start_offset" : 6, "end_offset" : 7, "type" : "<IDEOGRAPHIC>", "position" : 5 , "token" : "鏉?/span>", "start_offset" : 7, "end_offset" : 8, "type" : "<IDEOGRAPHIC>", "position" : 6 , "token" : "婧?/span>", "start_offset" : 8, "end_offset" : 9, "type" : "<IDEOGRAPHIC>", "position" : 7 , "token" : "鏈?/span>", "start_offset" : 10, "end_offset" : 11, "type" : "<IDEOGRAPHIC>", "position" : 8 , "token" : "鐭?/span>", "start_offset" : 11, "end_offset" : 12, "type" : "<IDEOGRAPHIC>", "position" : 9 , "token" : "鍘?/span>", "start_offset" : 12, "end_offset" : 13, "type" : "<IDEOGRAPHIC>", "position" : 10 , "token" : "鏂?/span>", "start_offset" : 13, "end_offset" : 14, "type" : "<IDEOGRAPHIC>", "position" : 11 , "token" : "https://www.cnblogs.com/Mc_HotHog/articles/1111111.html", "start_offset" : 15, "end_offset" : 70, "type" : "<URL>", "position" : 12 , "token" : "閭?/span>", "start_offset" : 70, "end_offset" : 71, "type" : "<IDEOGRAPHIC>", "position" : 13 , "token" : "绠?/span>", "start_offset" : 71, "end_offset" : 72, "type" : "<IDEOGRAPHIC>", "position" : 14 , "token" : "22222@qq.com", "start_offset" : 73, "end_offset" : 85, "type" : "<EMAIL>", "position" : 15 , "token" : "鐗?/span>", "start_offset" : 85, "end_offset" : 86, "type" : "<IDEOGRAPHIC>", "position" : 16 , "token" : "鏉?/span>", "start_offset" : 86, "end_offset" : 87, "type" : "<IDEOGRAPHIC>", "position" : 17 , "token" : "澹?/span>", "start_offset" : 87, "end_offset" : 88, "type" : "<IDEOGRAPHIC>", "position" : 18 , "token" : "鏄?/span>", "start_offset" : 88, "end_offset" : 89, "type" : "<IDEOGRAPHIC>", "position" : 19 , "token" : "鏈?/span>", "start_offset" : 90, "end_offset" : 91, "type" : "<IDEOGRAPHIC>", "position" : 20 , "token" : "鏂?/span>", "start_offset" : 91, "end_offset" : 92, "type" : "<IDEOGRAPHIC>", "position" : 21 , "token" : "涓?/span>", "start_offset" : 92, "end_offset" : 93, "type" : "<IDEOGRAPHIC>", "position" : 22 , "token" : "鍗?/span>", "start_offset" : 93, "end_offset" : 94, "type" : "<IDEOGRAPHIC>", "position" : 23 , "token" : "涓?/span>", "start_offset" : 94, "end_offset" : 95, "type" : "<IDEOGRAPHIC>", "position" : 24 , "token" : "鍘?/span>", "start_offset" : 95, "end_offset" : 96, "type" : "<IDEOGRAPHIC>", "position" : 25 , "token" : "鍒?/span>", "start_offset" : 96, "end_offset" : 97, "type" : "<IDEOGRAPHIC>", "position" : 26 , "token" : "鏂?/span>", "start_offset" : 97, "end_offset" : 98, "type" : "<IDEOGRAPHIC>", "position" : 27 , "token" : "绔?/span>", "start_offset" : 98, "end_offset" : 99, "type" : "<IDEOGRAPHIC>", "position" : 28 , "token" : "杞?/span>", "start_offset" : 100, "end_offset" : 101, "type" : "<IDEOGRAPHIC>", "position" : 29 , "token" : "杞?/span>", "start_offset" : 101, "end_offset" : 102, "type" : "<IDEOGRAPHIC>", "position" : 30 , "token" : "璇?/span>", "start_offset" : 102, "end_offset" : 103, "type" : "<IDEOGRAPHIC>", "position" : 31 , "token" : "闄?/span>", "start_offset" : 103, "end_offset" : 104, "type" : "<IDEOGRAPHIC>", "position" : 32 , "token" : "涓?/span>", "start_offset" : 104, "end_offset" : 105, "type" : "<IDEOGRAPHIC>", "position" : 33 , "token" : "鍗?/span>", "start_offset" : 105, "end_offset" : 106, "type" : "<IDEOGRAPHIC>", "position" : 34 , "token" : "鏂?/span>", "start_offset" : 106, "end_offset" : 107, "type" : "<IDEOGRAPHIC>", "position" : 35 , "token" : "閾?/span>", "start_offset" : 107, "end_offset" : 108, "type" : "<IDEOGRAPHIC>", "position" : 36 , "token" : "鎺?/span>", "start_offset" : 108, "end_offset" : 109, "type" : "<IDEOGRAPHIC>", "position" : 37 ]
鍐呯疆鍒嗚瘝杩囨护鍣?/h3>
浜嗚В鏇村https://www.elastic.co/guide/en/elasticsearch/reference/6.5/index.html
以上是关于Elasticsearch鐨勫垎鏋愯繃绋?鍐呯疆瀛楃杩囨护鍣ㄣ€佸垎鏋愬櫒銆佸垎璇嶅櫒銆佸垎璇嶈繃婊ゅ櫒锛堢湡鏄彉鎬佸鍟婏紒缇庢粙婊嬶級的主要内容,如果未能解决你的问题,请参考以下文章