鏈夐檺鐘舵€佹満涓嶭ucene鐨勯偅浜涗簨锛堝紑绡囷級

Posted 璺宠烦鐖哥殑Abc

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了鏈夐檺鐘舵€佹満涓嶭ucene鐨勯偅浜涗簨锛堝紑绡囷級相关的知识,希望对你有一定的参考价值。

纭畾鏈夐檺鐘舵€佹満锛坉eterministic finite automaton/dfa锛夋槸涓€涓暟瀛﹁绠楁ā鍨嬶紝缁勬垚閮ㄥ垎鏄竴涓?鍏冪粍锛?/span>

  1. 鏈夐檺鐨勭姸鎬侀泦Q

  2. 鏈夐檺鐨勮緭鍏ョ鍙稴锛屽張琚О浣渁lphabet锛堣窡鎴戜滑鐔熺煡鐨勮嫳鏂囧瓧姣嶈〃搴旇涓嶄竴鏍凤紝鏄釜寮曠敵锛?/span>

  3. 鐘舵€佸彉鎹㈠嚱鏁癋锛孎锛歋 饾棏 Q -> Q

  4. 鍒濆鐘舵€乻0锛宻0 鈭?Q

  5. 鎺ョ撼鐘舵€侀泦Z锛孼 鈯?Q

鍋囪鏈変竴涓瓧绗︿覆锛堢鍙峰簭鍒楋級w=a(0)|a(1)|a(2)|a(3)|鈥a(n)锛屼笖w 鈯?/span> S锛屽彟鏈塺=r(0)|r(1)|r(2)|r(3)|鈥?/span>|r(n)鐨勪竴涓姸鎬佸簭鍒楋紝褰搘鍜宺绗﹀悎濡備笅鏉′欢鏃惰涓虹姸鎬佹満M鎺ョ撼w锛?/span>

  1. r0 = s0锛屽垵濮嬬姸鎬佷负鐘舵€佹満鐨勫垵濮嬬姸鎬?/span>

  2. r(i+1) = F(r(i), a(i+1)), for i = 0,鈥?n-1锛涘綋鍓嶇姸鎬乺(i)杈撳叆a(i+1)鍚庤繘琛屽彉鎹㈠悗鐨勪笅涓€涓姸鎬佷负r(i+1)锛岃繖閲孎鍙樻崲鍚庣殑鐘舵€佸繀鐒跺睘浜嶲

  3. r(n) 鈭?Z锛屽彉鎹㈠悗鐨勬渶缁堢姸鎬佷负缁堟€乑涔嬩竴

纭畾鏈夐檺鐘舵€佹満鍜屼笉纭畾鏈夐檺鐘舵€佹満锛坣ondeterministic finite automaton/nfa锛夌殑鍖哄埆鍦ㄤ簬锛宒fa涓€瀵规簮鐘舵€佸拰杈撳叆绗﹀彿鍙互鍞竴纭畾涓€涓彉鎹㈡搷浣滐紝涓旀瘡涓姸鎬佸彉鎹㈡搷浣滈兘闇€瑕佷竴涓緭鍏ョ鍙凤紝鑰宯fa涓嶅仛杩欎釜闄愬埗锛堟瘮濡傚墠杩癲fa鐨勭粍鎴愮涓夋潯锛屽湪nfa锛岀姸鎬佸彉鎹㈠嚱鏁板彲浠ユ槸F: S 饾棏 Q-> P(Q)锛屽彲浠ヤ粠涓€涓姸鎬佸彉鎹负鑻ュ共涓叾浠栫姸鎬侊級锛宒fa鏄竴绉嶇壒娈婄姸鎬佷笅鐨刵fa銆?/span>



濡傚浘锛岃繖閲岃〃绀轰簡涓€涓猑A(B*)A$鐨勬鍒欏尮閰嶏紝褰撹緭鍏ヤ负A锛屽埌杈維2鐘舵€侊紝S2鐘舵€佹帴鍙楄緭鍏鎴朆锛岃緭鍏鐘舵€佷笉鍙橈紝杈撳叆A璺宠浆鍒扮姸鎬丼1锛屼负鎺ョ撼鐘舵€併€?/em>


鏈夐檺鐘舵€佹満鍦ㄥ鐞嗘枃瀛楀尮閰嶄笂鏈夐潪甯稿ソ鐨勪紭鍔匡紝Lucene 4.0涔嬪墠鐨勬ā绯婃煡璇紙fuzzy query锛夌敤浜嗙畝鍗曠矖鏆寸殑閬嶅巻娉曪細閬嶅巻绱㈠紩涓叏閮╰erm骞朵緷娆¤绠椾笌杈撳叆璇嶇殑缂栬緫璺濈锛屽彲浠ヨВ鍐抽棶棰橈紝浣嗘槸鏈夊緢楂樼殑绠楁硶澶嶆潅搴︼紝鍦?.0涔嬪悗Lucene寮曞叆浜嗘湁闄愮姸鎬佹満鏉ユ彁楂樻ā绯婃煡璇㈢殑鎬ц兘锛屾嵁璇存彁鍗囪揪鍒颁簡鐧惧€嶄箣澶?锛坔ttp://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html锛?/span>

Lucene涓笌鑷姩鏈虹浉鍏崇殑涓昏鏄?org.apache.lucene.util.automaton 鍖呬互鍙?org.apache.lucene.search 鍖呬笅涓?Automata/Automaton 鐩稿叧鐨勭被銆?/span>

浠ュ墠缂€鏌ヨ涓轰緥锛岄鍏堝皢妫€绱㈣瘝鏋勫缓涓轰竴涓湁闄愮姸鎬佹満锛屾瘮濡傜敤鍓嶇紑 "/Computer" 鍖归厤瀛楁 category锛屽彲浠ユ瀯寤轰负涓€涓?state 鍜?transition 閮芥槸11鐨勬湁闄愮姸鎬佹満锛屽苟灏嗘湯灏剧殑r瀛楃璁剧疆涓烘帴绾崇姸鎬侊紙accept=true锛夛細


濡傚浘锛屽€煎緱涓€鎻愮殑鏄疞ucene涓殑Automaton灏佽浜唗oDot()鏂规硶锛屽彲浠ュ皢鐘舵€佹満杈撳嚭涓篻raphviz鍙瘑鍒殑鎻忚堪绗︼紝閫氳繃graphviz鐨勫懡浠よ宸ュ叿dot鍙互杈撳嚭涓簆ng鏍煎紡鍥剧墖銆?/em>


鏈€鍚庝竴涓彉鎹㈡槸鎸囧湪鍒拌揪鎺ョ撼鐘舵€佷箣鍚庣殑浠绘剰杈撳叆渚濈劧婊¤冻鎺ョ撼鏉′欢锛堜篃灏辨槸琛ㄨ揪浜嗗墠缂€鏌ヨ鐨勬剰鎬濓級銆?/span>

杩欓噷璇磋閬囧埌鐨勪竴涓潙锛孡ucene 7.0 鐗堟湰鍓嶅瓨鍦ㄤ竴涓?bug (https://issues.apache.org/jira/browse/LUCENE-7914)锛屽湪閬囧埌 regex/prefix 绛夋煡璇㈡椂锛堝悗绔敤 Automata/AutomatonQuery 瀹炵幇锛夛紝鑰屾瀯寤?Automata 鐨勮繃绋嬩腑浼氱敤 Operations 鐨?nbsp;isFinite() 鏂规硶鍒ゆ柇鏋勫缓瀹屾垚鍚庣殑鐘舵€佹満鏄惁涓?DFA锛屽潙灏卞潙鍦?isFinite() 鏂规硶瀛樺湪閫掑綊锛屽綋 regex 鐢ㄤ簡濡備笅绫讳技鐨?regex 鏌ヨ锛屾垨鑰呮槸涓€涓潪甯搁暱鐨刾refix鏌ヨ锛岄兘鏈夊彲鑳戒細鍥犱负閫掑綊杩囧害鑰屾姤 StackOverFlow 寮傚父銆?/span>


POST /test/_search

{

  "query": {

    "regexp": {

      "test": "t{1,9500}"

    }

  }

}


璇锋敞鎰忥紝杩欎釜鎿嶄綔浼氬鑷存墽琛屾煡璇㈢殑鍏ㄩ儴 node 涓?ES 涓昏繘绋嬪紓甯搁€€鍑猴紝濡傛灉杩欎釜绱㈠紩鎭板阀鍒嗗竷鍦ㄩ泦缇ゅ唴鐨勫叏閮?node 涓婏紝閭d箞姝ょ被鏌ヨ鐩稿綋浜庡紩鐖嗕簡涓€棰楁牳寮广€傘€?/span>

褰撶劧 Lucene 7.0 涔嬪悗淇浜嗚繖涓棶棰橈紝鐩稿簲鐨?Es6.0 涔嬪悗鐗堟湰涔熷氨涓嶅瓨鍦ㄧ被浼肩殑闂浜嗭紝涓昏鐨勮В鍐虫柟娉曞氨鏄湪 isFinite() 鏂规硶閫掑綊涓檺鍒朵簡鏈€澶ч€掑綊娣卞害銆?/span>

璇翠簡杩欎箞澶氾紝鐜板湪灏卞啓鐐逛唬鐮侊紝绠€鍗曞疄鐜颁竴涓?prefix 鏌ヨ绀轰緥锛?/span>


public void testPrefixQuery() throws Exception {

  String[] categories = new String[] {"/Computers",

      "/Computers/Mac",

      "/Computers/Windows"};

  IndexWriterConfig config = new IndexWriterConfig();

  try (Directory directory = new RAMDirectory();

       IndexWriter writer = new IndexWriter(directory, config)) {

    for (int i = 0; i < categories.length; i++) {

      Document doc = new Document();

      doc.add(newStringField("category", categories[i], Field.Store.YES));

      writer.addDocument(doc);

    }

    try (IndexReader reader = DirectoryReader.open(writer)) {

      PrefixQuery query = new PrefixQuery(new Term("category", "/Computers"));

      IndexSearcher searcher = new IndexSearcher(reader);

      ScoreDoc[] hits = searcher.search(query, 1000).scoreDocs;

      System.out.println(Arrays.toString(hits));

    }

  }

}


杈撳嚭锛?/span>


[doc=0 score=1.0 shardIndex=0, doc=1 score=1.0 shardIndex=0, doc=2 score=1.0 shardIndex=0]


鏈€鍚庤琛ュ厖鐨勬槸锛屽鏋滃 Lucene 浠g爜鎺ュ彛姣旇緝鎰熷叴瓒o紝鍙互濂藉ソ鐮旂┒涓€涓?Lucene 婧愮爜涓殑鍗曞厓娴嬭瘯锛屽湪 Lucene/Solr 婧愪唬鐮佷腑鍖呭惈浜嗛潪甯镐赴瀵岀殑鍗曞厓娴嬭瘯锛屽姛鑳界偣瑕嗙洊闈篃闈炲父鍏紝闈炲父鍊煎緱鎴戜滑瀛︿範锛岃蒋浠舵祴璇曞拰璐ㄩ噺淇濋殰鐞嗗簲鏄蒋浠跺紑鍙戣€呬綔涓?owner 鐨勪簨鎯呫€?nbsp;


以上是关于鏈夐檺鐘舵€佹満涓嶭ucene鐨勯偅浜涗簨锛堝紑绡囷級的主要内容,如果未能解决你的问题,请参考以下文章

瓒e浘锛氳€佹墜璋冭瘯澶氱嚎绋嬶紝666

璋锋瓕澶х墰璇达細涓轰粈涔?Kotlin 姣斾綘浠敤鐨勯偅浜涘瀮鍦捐瑷€閮藉ソ

绉诲姩绔垨鑰呯Щ鍔ㄧh5闇€瑕佹敞鎰忕殑涓€浜涗簨

娣卞叆娴呭嚭JavaScript杩愯鏈哄埗

寰俊鏀寔鐨凙uthorization code鎺堟潈妯″紡锛堝叕浼楀彿寮€鍙戯級锛堝紑鏀惧钩鍙拌祫鏂欎腑蹇冧腑鐨勪唬鍏紬鍙峰彂璧风綉椤垫巿鏉冿級

(c)2006-2024 SYSTEM All Rights Reserved IT常识