MacOS涓婰ucene瀛︿範

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了MacOS涓婰ucene瀛︿範相关的知识,希望对你有一定的参考价值。

鏍囩锛?a href='http://www.mamicode.com/so/1/%e7%ae%80%e5%8d%95' title='绠€鍗?>绠€鍗?/a>   rda   google   鍋滅敤   鏀寔   鎬绘暟   缂栧彿   

瀛︿簬榛戦┈鍜屼紶鏅烘挱瀹㈣仈鍚堝仛鐨勬暀瀛﹂」鐩?鎰熻阿
榛戦┈瀹樼綉
浼犳櫤鎾瀹樼綉
寰俊鎼滅储"鑹烘湳琛岃€?quot;锛屽叧娉ㄥ苟鍥炲鍏抽敭璇?quot;lucene"鑾峰彇瑙嗛鍜屾暀绋嬭祫鏂欙紒
b绔欏湪绾胯棰?/a>

鍏ㄦ枃妫€绱㈡妧鏈?lucene

1 璇剧▼璁″垝

  1. 浠€涔堟槸鍏ㄦ枃妫€绱?濡備綍瀹炵幇鍏ㄦ枃妫€绱?/p>

  2. Lucene瀹炵幇鍏ㄦ枃妫€绱㈢殑娴佺▼

    a) 鍒涘缓绱㈠紩

    b) 鏌ヨ绱㈠紩

  3. 閰嶇疆寮€鍙戠幆澧?/p>

  4. 鍏ラ棬绋嬪簭

  5. 鍒嗘瀽鍣ㄧ殑鍒嗘瀽杩囩▼
    a) 娴嬭瘯鍒嗘瀽鍣ㄧ殑鍒嗚瘝鏁堟灉
    b) 绗笁鏂逛腑鏂囧垎鏋愬櫒

  6. 绱㈠紩搴撶淮鎶?/p>

    a) 娣诲姞鏂囨。

    b) 鍒犻櫎鏂囨。

    c) 淇敼鏂囨。

  7. 绱㈠紩搴撴煡璇?/p>

    a) 浣跨敤Query瀛愮被鏌ヨ

    b) 浣跨敤QueryParser鏌ヨ

2.1 鏁版嵁鍒嗙被

鎴戜滑鐢熸椿涓殑鏁版嵁鎬讳綋鍒嗕负涓ょ锛氱粨鏋勫寲鏁版嵁鍜岄潪缁撴瀯鍖栨暟鎹€?/p>

缁撴瀯鍖栨暟鎹細鎸囧叿鏈夊浐瀹氭牸寮忔垨鏈夐檺闀垮害鐨勬暟鎹?濡傛暟鎹簱,鍏冩暟鎹瓑銆?/p>

闈炵粨鏋勫寲鏁版嵁锛氭寚涓嶅畾闀挎垨鏃犲浐瀹氭牸寮忕殑鏁版嵁,濡傞偖浠?word鏂囨。绛夌鐩樹笂鐨勬枃浠?/p>

2.3 闈炵粨鏋勫寲鏁版嵁鏌ヨ鏂规硶

  1. 椤哄簭鎵弿娉?(Serial Scanning)
    鎵€璋撻『搴忔壂鎻?姣斿瑕佹壘鍐呭鍖呭惈鏌愪竴涓瓧绗︿覆鐨勬枃浠?灏辨槸涓€涓枃妗d竴涓枃妗g殑鐪?瀵逛簬姣忎竴涓枃妗?浠庡ご鐪嬪埌灏?濡傛灉姝ゆ枃妗e寘鍚瀛楃涓?鍒欐鏂囨。涓烘垜浠鎵剧殑鏂囦欢,鎺ョ潃鐪嬩笅涓€涓枃浠?鐩村埌鎵弿瀹屾墍鏈夌殑鏂囦欢銆傚鍒╃敤MacOS鐨勬悳绱篃鍙互鎼滅储鏂囦欢鍐呭,鍙槸鐩稿綋鐨勬參銆?/p>

  2. 鍏ㄦ枃妫€绱?(Full-text Search)
    灏嗛潪缁撴瀯鍖栨暟鎹腑鐨勪竴閮ㄥ垎淇℃伅鎻愬彇鍑烘潵,閲嶆柊缁勭粐,浣垮叾鍙樺緱鏈変竴瀹氱粨鏋?鐒跺悗瀵规鏈変竴瀹氱粨鏋勭殑鏁版嵁杩涜鎼滅储,浠庤€岃揪鍒版悳绱㈢浉瀵硅緝蹇殑鐩殑銆傝繖閮ㄥ垎浠庨潪缁撴瀯鍖栨暟鎹腑鎻愬彇鍑虹殑鐒跺悗閲嶆柊缁勭粐鐨勪俊鎭?鎴戜滑绉颁箣绱㈠紩銆?br> 渚嬪锛氬瓧鍏搞€?br> 瀛楀吀鐨勬嫾闊宠〃鍜岄儴棣栨瀛楄〃灏辩浉褰撲簬瀛楀吀鐨勭储寮?瀵规瘡涓€涓瓧鐨勮В閲婃槸闈炵粨鏋勫寲鐨?濡傛灉瀛楀吀娌℃湁闊宠妭琛ㄥ拰閮ㄩ妫€瀛楄〃,鍦ㄨ尗鑼緸娴蜂腑鎵句竴涓瓧鍙兘椤哄簭鎵弿銆傜劧鑰屽瓧鐨勬煇浜涗俊鎭彲浠ユ彁鍙栧嚭鏉ヨ繘琛岀粨鏋勫寲澶勭悊,姣斿璇婚煶,灏辨瘮杈冪粨鏋勫寲,鍒嗗0姣嶅拰闊垫瘝,鍒嗗埆鍙湁鍑犵鍙互涓€涓€鍒椾妇,浜庢槸灏嗚闊虫嬁鍑烘潵鎸変竴瀹氱殑椤哄簭鎺掑垪,姣忎竴椤硅闊抽兘鎸囧悜姝ゅ瓧鐨勮缁嗚В閲婄殑椤垫暟銆傛垜浠悳绱㈡椂鎸夌粨鏋勫寲鐨勬嫾闊虫悳鍒拌闊?鐒跺悗鎸夊叾鎸囧悜鐨勯〉鏁?渚垮彲鎵惧埌鎴戜滑鐨勯潪缁撴瀯鍖栨暟鎹€斺€斾篃鍗冲瀛楃殑瑙i噴銆?br> 杩欑鍏堝缓绔嬬储寮?鍐嶅绱㈠紩杩涜鎼滅储鐨勮繃绋嬪氨鍙叏鏂囨绱?Full-text Search)銆?br> 铏界劧鍒涘缓绱㈠紩鐨勮繃绋嬩篃鏄潪甯歌€楁椂鐨?浣嗘槸绱㈠紩涓€鏃﹀垱寤哄氨鍙互澶氭浣跨敤,鍏ㄦ枃妫€绱富瑕佸鐞嗙殑鏄煡璇?鎵€浠ヨ€楁椂闂村垱寤虹储寮曟槸鍊煎緱鐨勩€?/p>

2.5 鍏ㄦ枃妫€绱㈢殑搴旂敤鍦烘櫙

瀵逛簬鏁版嵁閲忓ぇ銆佹暟鎹粨鏋勪笉鍥哄畾鐨勬暟鎹彲閲囩敤鍏ㄦ枃妫€绱㈡柟寮忔悳绱?姣斿鐧惧害銆丟oogle绛夋悳绱㈠紩鎿庛€佽鍧涚珯鍐呮悳绱€佺數鍟嗙綉绔欑珯鍐呮悳绱㈢瓑銆?/p>

3 Lucene瀹炵幇鍏ㄦ枃妫€绱㈢殑娴佺▼

3.1 绱㈠紩鍜屾悳绱㈡祦绋嬪浘

鎶€鏈浘鐗? src=

  1. 缁胯壊琛ㄧず绱㈠紩杩囩▼,瀵硅鎼滅储鐨勫師濮嬪唴瀹硅繘琛岀储寮曟瀯寤轰竴涓储寮曞簱,绱㈠紩杩囩▼鍖呮嫭锛氱‘瀹氬師濮嬪唴瀹瑰嵆瑕佹悳绱㈢殑鍐呭->閲囬泦鏂囨。->鍒涘缓鏂囨。->鍒嗘瀽鏂囨。->绱㈠紩鏂囨。
  2. 绾㈣壊琛ㄧず鎼滅储杩囩▼,浠庣储寮曞簱涓悳绱㈠唴瀹?鎼滅储杩囩▼鍖呮嫭锛氱敤鎴烽€氳繃鎼滅储鐣岄潰->鍒涘缓鏌ヨ->鎵ц鎼滅储,浠庣储寮曞簱鎼滅储->娓叉煋鎼滅储缁撴灉

3.2 鍒涘缓绱㈠紩

瀵规枃妗g储寮曠殑杩囩▼,灏嗙敤鎴疯鎼滅储鐨勬枃妗e唴瀹硅繘琛岀储寮?绱㈠紩瀛樺偍鍦ㄧ储寮曞簱(index)涓€?/p>

杩欓噷鎴戜滑瑕佹悳绱㈢殑鏂囨。鏄鐩樹笂鐨勬枃鏈枃浠?鏍规嵁妗堜緥鎻忚堪锛氬嚒鏄枃浠跺悕鎴栨枃浠跺唴瀹瑰寘鎷叧閿瓧鐨勬枃浠堕兘瑕佹壘鍑烘潵,杩欓噷瑕佸鏂囦欢鍚嶅拰鏂囦欢鍐呭鍒涘缓绱㈠紩銆?/p>

3.2.1 鑾峰緱鍘熷鏂囨。

鍘熷鏂囨。鏄寚瑕佺储寮曞拰鎼滅储鐨勫唴瀹广€傚師濮嬪唴瀹瑰寘鎷簰鑱旂綉涓婄殑缃戦〉銆佹暟鎹簱涓殑鏁版嵁銆佺鐩樹笂鐨勬枃浠剁瓑銆?/p>

鏈渚嬩腑鐨勫師濮嬪唴瀹瑰氨鏄鐩樹笂鐨勬枃浠?濡備笅鍥撅細

鎶€鏈浘鐗? src=

浠庝簰鑱旂綉涓娿€佹暟鎹簱銆佹枃浠剁郴缁熶腑绛夎幏鍙栭渶瑕佹悳绱㈢殑鍘熷淇℃伅,杩欎釜杩囩▼灏辨槸淇℃伅閲囬泦,淇℃伅閲囬泦鐨勭洰鐨勬槸涓轰簡瀵瑰師濮嬪唴瀹硅繘琛岀储寮曘€?/p>

鍦↖nternet涓婇噰闆嗕俊鎭殑杞欢閫氬父绉颁负鐖櫕鎴栬湗铔?涔熺О涓虹綉缁滄満鍣ㄤ汉,鐖櫕璁块棶浜掕仈缃戜笂鐨勬瘡涓€涓綉椤?灏嗚幏鍙栧埌鐨勭綉椤靛唴瀹瑰瓨鍌ㄨ捣鏉ャ€?/p>

鏈渚嬫垜浠鑾峰彇纾佺洏涓婃枃浠剁殑鍐呭,鍙互閫氳繃鏂囦欢娴佹潵璇诲彇鏂囨湰鏂囦欢鐨勫唴瀹?瀵逛簬pdf銆乨oc銆亁ls绛夋枃浠跺彲閫氳繃绗笁鏂规彁渚涚殑瑙f瀽宸ュ叿璇诲彇鏂囦欢鍐呭,姣斿Apache POI璇诲彇doc鍜寈ls鐨勬枃浠跺唴瀹广€?/p>

3.2.2 鍒涘缓鏂囨。瀵硅薄

鑾峰彇鍘熷鍐呭鐨勭洰鐨勬槸涓轰簡绱㈠紩,鍦ㄧ储寮曞墠闇€瑕佸皢鍘熷鍐呭鍒涘缓鎴愭枃妗?Document),鏂囨。涓寘鎷竴涓竴涓殑鍩?Field),鍩熶腑瀛樺偍鍐呭銆?/p>

杩欓噷鎴戜滑鍙互灏嗙鐩樹笂鐨勪竴涓枃浠跺綋鎴愪竴涓猟ocument,Document涓寘鎷竴浜汧ield(file_name鏂囦欢鍚嶇О銆乫ile_path鏂囦欢璺緞銆乫ile_size鏂囦欢澶у皬銆乫ile_content鏂囦欢鍐呭),濡備笅鍥撅細

鎶€鏈浘鐗? src=

娉ㄦ剰锛氭瘡涓狣ocument鍙互鏈夊涓狥ield锛屼笉鍚岀殑Document鍙互鏈変笉鍚岀殑Field锛屽悓涓€涓狣ocument鍙互鏈夌浉鍚岀殑Field(鍩熷悕鍜屽煙鍊奸兘鐩稿悓)

姣忎釜鏂囨。閮芥湁涓€涓敮涓€鐨勭紪鍙凤紝灏辨槸鏂囨。id銆?/p>

3.2.3 鍒嗘瀽鏂囨。

灏嗗師濮嬪唴瀹瑰垱寤轰负鍖呭惈鍩燂紙Field锛夌殑鏂囨。锛坉ocument锛夛紝闇€瑕佸啀瀵瑰煙涓殑鍐呭杩涜鍒嗘瀽锛屽垎鏋愮殑杩囩▼鏄粡杩囧鍘熷鏂囨。鎻愬彇鍗曡瘝銆佸皢瀛楁瘝杞负灏忓啓銆佸幓闄ゆ爣鐐圭鍙枫€佸幓闄ゅ仠鐢ㄨ瘝绛夎繃绋嬬敓鎴愭渶缁堢殑璇眹鍗曞厓锛屽彲浠ュ皢璇眹鍗曞厓鐞嗚В涓轰竴涓竴涓殑鍗曡瘝銆?/p>

姣斿涓嬭竟鐨勬枃妗g粡杩囧垎鏋愬涓嬶細

鍘熸枃妗e唴瀹癸細

Lucene is a Java full-text search engine. Lucene is not a complete

application, but rather a code library and API that can easily be used

to add search capabilities to applications.

鍒嗘瀽鍚庡緱鍒扮殑璇眹鍗曞厓锛?/p>

lucene銆乯ava銆乫ull銆乻earch銆乪ngine銆傘€傘€傘€?/p>

姣忎釜鍗曡瘝鍙仛涓€涓猅erm锛屼笉鍚岀殑鍩熶腑鎷嗗垎鍑烘潵鐨勭浉鍚岀殑鍗曡瘝鏄笉鍚岀殑term銆倀erm涓寘鍚袱閮ㄥ垎:涓€閮ㄥ垎鏄枃妗g殑鍩熷悕锛屽彟涓€閮ㄥ垎鏄崟璇嶇殑鍐呭銆?/p>

渚嬪锛氭枃浠跺悕涓寘鍚玜pache鍜屾枃浠跺唴瀹逛腑鍖呭惈鐨刟pache鏄笉鍚岀殑term銆?/p>

3.2.4 鍒涘缓绱㈠紩

瀵规墍鏈夋枃妗e垎鏋愬緱鍑虹殑璇眹鍗曞厓杩涜绱㈠紩锛岀储寮曠殑鐩殑鏄负浜嗘悳绱紝鏈€缁堣瀹炵幇鍙悳绱㈣绱㈠紩鐨勮姹囧崟鍏冧粠鑰屾壘鍒癉ocument锛堟枃妗o級銆?/p>

鎶€鏈浘鐗? src=

娉ㄦ剰锛氬垱寤虹储寮曟槸瀵硅姹囧崟鍏冪储寮曪紝閫氳繃璇嶈鎵炬枃妗o紝杩欑绱㈠紩鐨勭粨鏋勫彨鍊掓帓绱㈠紩缁撴瀯銆?/p>

浼犵粺鏂规硶鏄牴鎹枃浠舵壘鍒拌鏂囦欢鐨勫唴瀹癸紝鍦ㄦ枃浠跺唴瀹逛腑鍖归厤鎼滅储鍏抽敭瀛楋紝杩欑鏂规硶鏄『搴忔壂鎻忔柟娉曪紝鏁版嵁閲忓ぇ銆佹悳绱㈡參銆?/p>

鍊掓帓绱㈠紩缁撴瀯鏄牴鎹唴瀹癸紙璇嶈锛夋壘鏂囨。锛屽涓嬪浘锛?/p>

鎶€鏈浘鐗? src=

鍊掓帓绱㈠紩缁撴瀯涔熷彨鍙嶅悜绱㈠紩缁撴瀯锛屽寘鎷储寮曞拰鏂囨。涓ら儴鍒嗭紝绱㈠紩鍗宠瘝姹囪〃锛屽畠鐨勮妯¤緝灏忥紝鑰屾枃妗i泦鍚堣緝澶с€?/p>

3.3鏌ヨ绱㈠紩

鏌ヨ绱㈠紩涔熸槸鎼滅储鐨勮繃绋嬨€傛悳绱㈠氨鏄敤鎴疯緭鍏ュ叧閿瓧锛屼粠绱㈠紩锛坕ndex锛変腑杩涜鎼滅储鐨勮繃绋嬨€傛牴鎹叧閿瓧鎼滅储绱㈠紩锛屾牴鎹储寮曟壘鍒板搴旂殑鏂囨。锛屼粠鑰屾壘鍒拌鎼滅储鐨勫唴瀹癸紙杩欓噷鎸囩鐩樹笂鐨勬枃浠讹級銆?/p>

3.3.1 鐢ㄦ埛鏌ヨ鎺ュ彛

鍏ㄦ枃妫€绱㈢郴缁熸彁渚涚敤鎴锋悳绱㈢殑鐣岄潰渚涚敤鎴锋彁浜ゆ悳绱㈢殑鍏抽敭瀛楋紝鎼滅储瀹屾垚灞曠ず鎼滅储缁撴灉銆?/p>

姣斿锛氱櫨搴?/p>

鎶€鏈浘鐗? src=

Lucene涓嶆彁渚涘埗浣滅敤鎴锋悳绱㈢晫闈㈢殑鍔熻兘锛岄渶瑕佹牴鎹嚜宸辩殑闇€姹傚紑鍙戞悳绱㈢晫闈€?/p>

3.3.2 鍒涘缓鏌ヨ

鐢ㄦ埛杈撳叆鏌ヨ鍏抽敭瀛楁墽琛屾悳绱箣鍓嶉渶瑕佸厛鏋勫缓涓€涓煡璇㈠璞★紝鏌ヨ瀵硅薄涓彲浠ユ寚瀹氭煡璇㈣鎼滅储鐨凢ield鏂囨。鍩熴€佹煡璇㈠叧閿瓧绛夛紝鏌ヨ瀵硅薄浼氱敓鎴愬叿浣撶殑鏌ヨ璇硶锛?/p>

渚嬪锛?/p>

璇硶 鈥渇ileName:lucene鈥濊〃绀鸿鎼滅储Field鍩熺殑鍐呭涓衡€渓ucene鈥濈殑鏂囨。

3.3.3 鎵ц鏌ヨ

鎼滅储绱㈠紩杩囩▼锛?/p>

鏍规嵁鏌ヨ璇硶鍦ㄥ€掓帓绱㈠紩璇嶅吀琛ㄤ腑鍒嗗埆鎵惧嚭瀵瑰簲鎼滅储璇嶇殑绱㈠紩锛屼粠鑰屾壘鍒扮储寮曟墍閾炬帴鐨勬枃妗i摼琛ㄣ€?/p>

姣斿鎼滅储璇硶涓衡€渇ileName:lucene鈥濊〃绀烘悳绱㈠嚭fileName鍩熶腑鍖呭惈Lucene鐨勬枃妗c€?/p>

鎼滅储杩囩▼灏辨槸鍦ㄧ储寮曚笂鏌ユ壘鍩熶负fileName锛屽苟涓斿叧閿瓧涓篖ucene鐨則erm锛屽苟鏍规嵁term鎵惧埌鏂囨。id鍒楄〃銆?/p>

鎶€鏈浘鐗? src=

3.3.4 娓叉煋缁撴灉

浠ヤ竴涓弸濂界殑鐣岄潰灏嗘煡璇㈢粨鏋滃睍绀虹粰鐢ㄦ埛锛岀敤鎴锋牴鎹悳绱㈢粨鏋滄壘鑷繁鎯宠鐨勪俊鎭紝涓轰簡甯姪鐢ㄦ埛寰堝揩鎵惧埌鑷繁鐨勭粨鏋滐紝鎻愪緵浜嗗緢澶氬睍绀虹殑鏁堟灉锛屾瘮濡傛悳绱㈢粨鏋滀腑灏嗗叧閿瓧楂樹寒鏄剧ず锛岀櫨搴︽彁渚涚殑蹇収绛夈€?/p>

鎶€鏈浘鐗? src=

4.1 Lucene涓嬭浇

Lucene鏄紑鍙戝叏鏂囨绱㈠姛鑳界殑宸ュ叿鍖咃紝浠?a href="http://lucene.apache.org/">瀹樻柟缃戠珯涓嬭浇lucene-7.4.0锛屽苟瑙e帇銆?/p>

鎶€鏈浘鐗? src=

鐗堟湰锛歭ucene-7.4.0

JDK瑕佹眰锛?.8浠ヤ笂

5 鍏ラ棬绋嬪簭

5.2 鍒涘缓绱㈠紩

5.2.1 瀹炵幇姝ラ

绗竴姝ワ細鍒涘缓涓€涓猨ava宸ョ▼锛屽苟瀵煎叆jar鍖呫€?/p>

鍖呯粨鏋?

鎶€鏈浘鐗? src=

绗簩姝ワ細鍒涘缓涓€涓猧ndexwriter瀵硅薄銆?/p>

? 1锛夋寚瀹氱储寮曞簱鐨勫瓨鏀句綅缃瓺irectory瀵硅薄

? 2锛夋寚瀹氫竴涓狪ndexWriterConfig瀵硅薄銆?/p>

绗簩姝ワ細鍒涘缓document瀵硅薄銆?/p>

绗笁姝ワ細鍒涘缓field瀵硅薄锛屽皢field娣诲姞鍒癲ocument瀵硅薄涓€?/p>

绗洓姝ワ細浣跨敤indexwriter瀵硅薄灏哾ocument瀵硅薄鍐欏叆绱㈠紩搴擄紝姝よ繃绋嬭繘琛岀储寮曞垱寤恒€傚苟灏嗙储寮曞拰document瀵硅薄鍐欏叆绱㈠紩搴撱€?/p>

绗簲姝ワ細鍏抽棴IndexWriter瀵硅薄銆?/p>

5.2.2 浠g爜瀹炵幇
package org.example.lucene;

import org.apache.commons.io.FileUtils;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.junit.Test;

import java.io.File;

/**
 * @author HackerStar
 * @create 2020-05-13 17:40
 */
public class LuceneFirst {
    /**
     * 鍒涘缓绱㈠紩
     *
     * @throws Exception
     */
    @Test
    public void createIndex() throws Exception {
        //1銆佸垱寤轰竴涓狣irector瀵硅薄锛屾寚瀹氱储寮曞簱淇濆瓨鐨勪綅缃?            //鎶婄储寮曞簱淇濆瓨鍦ㄥ唴瀛樹腑
            //Directory directory = new RAMDirectory();
        //鎶婄储寮曞簱淇濆瓨鍦ㄧ鐩?        Directory directory = FSDirectory.open(new File("/Users/xxx/Development/Lucene/index").toPath());
        //2銆佸熀浜嶥irectory瀵硅薄鍒涘缓涓€涓狪ndexWriter瀵硅薄
        IndexWriterConfig config = new IndexWriterConfig();
        IndexWriter indexWriter = new IndexWriter(directory, config);
        //3銆佽鍙栫鐩樹笂鐨勬枃浠?鍘熷鏂囨。)锛屽搴旀瘡涓枃浠跺垱寤轰竴涓枃妗e璞°€?        File dir = new File("/Users/xxx/Development/Lucene/searchsource");
        for (File file :
                dir.listFiles()) {
            //鏂囦欢鍚?            String fileName = file.getName();
            //鏂囦欢鍐呭(瀵煎叆commons-io-2.6.jar鍖呭埌lib鏂囦欢澶?
            String fileContent = FileUtils.readFileToString(file);
            //鏂囦欢璺緞
            String filePath = file.getPath();
            //鏂囦欢澶у皬
            long fileSize = FileUtils.sizeOf(file);
            //4銆佸垱寤哄煙
            //绗竴涓弬鏁帮細鍩熺殑鍚嶇О
            //绗簩涓弬鏁帮細鍩熺殑鍐呭
            //绗笁涓弬鏁帮細鏄惁瀛樺偍
            //鏂囦欢鍚嶅煙
            Field fileNameField = new TextField("fileName", fileName, Field.Store.YES);
            //鏂囦欢鍐呭鍩?            Field fileContentField = new TextField("fileContent", fileContent, Field.Store.YES);
            //鏂囦欢璺緞鍩燂紙涓嶅垎鏋愩€佷笉绱㈠紩銆佸彧瀛樺偍锛?            Field filePathField = new TextField("filePath", filePath, Field.Store.YES);
            //鏂囦欢澶у皬鍩?            Field fileSizeField = new TextField("fileSize", fileSize + "", Field.Store.YES);

           //5銆佸垱寤篸ocument瀵硅薄
            Document document = new Document();
            document.add(fileNameField);
            document.add(fileContentField);
            document.add(filePathField);
            document.add(fileSizeField);

            //6銆佸垱寤虹储寮曪紝鎶婃枃妗e璞″啓鍏ョ储寮曞簱
            indexWriter.addDocument(document);
        }
        //7銆佸叧闂璱ndexwriter瀵硅薄
        indexWriter.close();
    }
}
5.2.3 浣跨敤Luke宸ュ叿鏌ョ湅绱㈠紩鏂囦欢

鎴戜滑浣跨敤鐨刲uke鐨勭増鏈槸luke-7.4.0锛岃窡lucene鐨勭増鏈搴旂殑銆傚彲浠ユ墦寮€7.4.0鐗堟湰鐨刲ucene鍒涘缓鐨勭储寮曞簱銆傞渶瑕佹敞鎰忕殑鏄鐗堟湰鐨凩uke鏄痡dk9缂栬瘧鐨勶紝鎵€浠ヨ鎯宠繍琛屾宸ュ叿杩橀渶瑕乯dk9鎵嶅彲浠ャ€?/p>

鎶€鏈浘鐗? src=

5.3 鏌ヨ绱㈠紩

5.3.1 瀹炵幇姝ラ

绗竴姝ワ細鍒涘缓涓€涓狣irectory瀵硅薄锛屼篃灏辨槸绱㈠紩搴撳瓨鏀剧殑浣嶇疆銆?/p>

绗簩姝ワ細鍒涘缓涓€涓猧ndexReader瀵硅薄锛岄渶瑕佹寚瀹欴irectory瀵硅薄銆?/p>

绗笁姝ワ細鍒涘缓涓€涓猧ndexsearcher瀵硅薄锛岄渶瑕佹寚瀹欼ndexReader瀵硅薄

绗洓姝ワ細鍒涘缓涓€涓猅ermQuery瀵硅薄锛屾寚瀹氭煡璇㈢殑鍩熷拰鏌ヨ鐨勫叧閿瘝銆?/p>

绗簲姝ワ細鎵ц鏌ヨ銆?/p>

绗叚姝ワ細杩斿洖鏌ヨ缁撴灉銆傞亶鍘嗘煡璇㈢粨鏋滃苟杈撳嚭銆?/p>

绗竷姝ワ細鍏抽棴IndexReader瀵硅薄

5.3.2 浠g爜瀹炵幇
@Test
    public void searchIndex() throws Exception {
        //鎸囧畾绱㈠紩搴撳瓨鏀剧殑璺緞
        Directory directory = FSDirectory.open(new File("/Users/xxx/Development/Lucene/index").toPath());
        //鍒涘缓indexReader瀵硅薄
        IndexReader indexReader = DirectoryReader.open(directory);
        //鍒涘缓indexsearcher瀵硅薄
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        //鍒涘缓鏌ヨ
        Query query = new TermQuery(new Term("fileName", "apache"));
        //鎵ц鏌ヨ
        //绗竴涓弬鏁帮細鏌ヨ瀵硅薄
        //绗簩涓弬鏁帮細鏌ヨ缁撴灉杩斿洖鐨勬渶澶у€?        TopDocs topDocs = indexSearcher.search(query, 10);
        System.out.println("鏌ヨ缁撴灉鐨勬€绘潯鏁?quot; + topDocs.totalHits);
        //閬嶅巻鏌ヨ缁撴灉
        //topDocs.scoreDocs瀛樺偍浜哾ocument瀵硅薄鐨刬d
        for (ScoreDoc scoreDoc
                : topDocs.scoreDocs
        ) {
            //scoreDoc.doc灞炴€у氨鏄痙ocument瀵硅薄鐨刬d
            //鏍规嵁document鐨刬d鎵惧埌document瀵硅薄
            Document document = indexSearcher.doc(scoreDoc.doc);
            System.out.println("鏂囦欢鍚嶏細" + document.get("fileName"));
            System.out.println("鏂囦欢鍐呭锛?quot; + document.get("fileContent"));
            System.out.println("鏂囦欢璺緞锛?quot; + document.get("filePath"));
            System.out.println("鏂囦欢澶у皬锛?quot; + document.get("fileSize"));
            System.out.println("---------------------------------------------------");
        }
        //鍏抽棴indexReader瀵硅薄
        indexReader.close();
    }

6.1 鍒嗘瀽鍣ㄧ殑鍒嗚瘝鏁堟灉

鏍囧噯鍒嗘瀽鍣ㄧ殑鍒嗚瘝鏁堟灉

  1. 鑻辨枃

鎶€鏈浘鐗? src=

  1. 涓枃

鎶€鏈浘鐗? src=

浠g爜

 @Test
    public void testTokenStream() throws Exception {
        //鍒涘缓涓€涓爣鍑嗗垎鏋愬櫒瀵硅薄
        Analyzer analyzer = new StandardAnalyzer();
        //鑾峰緱tokenStream瀵硅薄
            //绗竴涓弬鏁帮細鍩熷悕锛堟澶勫彲浠ラ殢渚挎寚瀹氾級
            //绗簩涓弬鏁帮細瑕佸垎鏋愮殑鏂囨湰鍐呭
        TokenStream tokenStream = analyzer.tokenStream("test", "鍗冮噷涔嬭锛屽浜庤冻涓?quot;);
        //娣诲姞涓€涓紩鐢紝鍙互鑾峰緱姣忎釜鍏抽敭璇?        CharTermAttribute charTermAttribute = tokenStream.addAttribute((CharTermAttribute.class));
        //娣诲姞涓€涓亸绉婚噺鐨勫紩鐢紝璁板綍浜嗗叧閿瘝鐨勫紑濮嬩綅缃互鍙婄粨鏉熶綅缃?        OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
        //灏嗘寚閽堣皟鏁村埌鍒楄〃鐨勫ご閮?        tokenStream.reset();
        //閬嶅巻鍏抽敭璇嶅垪琛紝閫氳繃incrementToken鏂规硶鍒ゆ柇鍒楄〃鏄惁缁撴潫
        while(tokenStream.incrementToken()) {
            //鍏抽敭璇嶇殑璧峰浣嶇疆
            System.out.println("start->" + offsetAttribute.startOffset());
            //鍙栧叧閿瘝
            System.out.println(charTermAttribute);
            //缁撴潫浣嶇疆
            System.out.println("end->" + offsetAttribute.endOffset());

        }
        tokenStream.close();
    }

6.2.1Lucene鑷甫鐨勪腑鏂囧垎鏋愬櫒

  • StandardAnalyzer锛氬崟瀛楀垎璇嶏細灏辨槸鎸夌収涓枃涓€涓瓧涓€涓瓧鍦拌繘琛屽垎璇嶃€傚锛氣€滄垜鐖变腑鍥解€濓紝鏁堟灉锛氣€滄垜鈥濄€佲€滅埍鈥濄€佲€滀腑鈥濄€佲€滃浗鈥濄€?/p>

  • l SmartChineseAnalyzer锛氬涓枃鏀寔杈冨ソ锛屼絾鎵╁睍鎬у樊锛屾墿灞曡瘝搴擄紝绂佺敤璇嶅簱鍜屽悓涔夎瘝搴撶瓑涓嶅ソ澶勭悊

6.2.2 IKAnalyzer

鎶€鏈浘鐗? src=

浣跨敤鏂规硶锛?/p>

绗竴姝ワ細鎶妀ar鍖呮坊鍔犲埌宸ョ▼涓?/p>

绗簩姝ワ細鎶婇厤缃枃浠跺拰鎵╁睍璇嶅吀鍜屽仠鐢ㄨ瘝璇嶅吀娣诲姞鍒癱lasspath涓?/p>

鍖呯粨鏋?

鎶€鏈浘鐗? src=

娉ㄦ剰锛歨otword.dic鍜宔xt_stopword.dic鏂囦欢鐨勬牸寮忎负UTF-8锛屾敞鎰忔槸鏃燘OM 鐨刄TF-8 缂栫爜銆?/p>

涔熷氨鏄绂佹浣跨敤windows璁颁簨鏈紪杈戞墿灞曡瘝鍏告枃浠?/p>

浠g爜锛?/p>

@Test
    public void userIK() throws Exception {
        //1銆佺储寮曞簱瀛樻斁浣嶇疆
        Directory directory = FSDirectory.open(new File("/Users/XinxingWang/Development/Lucene/index").toPath());
        IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer());
        //2銆佸垱寤轰竴涓猧ndexWriter瀵硅薄
        IndexWriter indexWriter = new IndexWriter(directory, config);
        //3銆佽鍙栫鐩樹笂鐨勬枃浠?鍘熷鏂囨。)锛屽搴旀瘡涓枃浠跺垱寤轰竴涓枃妗e璞°€?        File dir = new File("/Users/xxx/Development/Lucene/searchsource");
        for (File file :
                dir.listFiles()) {
            //鏂囦欢鍚?            String fileName = file.getName();
            //鏂囦欢鍐呭(瀵煎叆commons-io-2.6.jar鍖呭埌lib鏂囦欢澶?
            String fileContent = FileUtils.readFileToString(file);
            //鏂囦欢璺緞
            String filePath = file.getPath();
            //鏂囦欢澶у皬
            long fileSize = FileUtils.sizeOf(file);
            //4銆佸垱寤哄煙
            //绗竴涓弬鏁帮細鍩熺殑鍚嶇О
            //绗簩涓弬鏁帮細鍩熺殑鍐呭
            //绗笁涓弬鏁帮細鏄惁瀛樺偍
            //鏂囦欢鍚嶅煙
            Field fileNameField = new TextField("fileName", fileName, Field.Store.YES);
            //鏂囦欢鍐呭鍩?            Field fileContentField = new TextField("fileContent", fileContent, Field.Store.YES);
            //鏂囦欢璺緞鍩燂紙涓嶅垎鏋愩€佷笉绱㈠紩銆佸彧瀛樺偍锛?            Field filePathField = new TextField("filePath", filePath, Field.Store.YES);
            //鏂囦欢澶у皬鍩?            Field fileSizeField = new TextField("fileSize", fileSize + "", Field.Store.YES);

            //5銆佸垱寤篸ocument瀵硅薄
            Document document = new Document();
            document.add(fileNameField);
            document.add(fileContentField);
            document.add(filePathField);
            document.add(fileSizeField);

            //6銆佸垱寤虹储寮曪紝鎶婃枃妗e璞″啓鍏ョ储寮曞簱
            indexWriter.addDocument(document);
        }
        //7銆佸叧闂璱ndexwriter瀵硅薄
        indexWriter.close();
    }

7 绱㈠紩搴撶殑缁存姢

7.1 绱㈠紩搴撶殑娣诲姞

7.1.2 娣诲姞鏂囨。浠g爜瀹炵幇
package org.example.lucene;

import org.apache.lucene.document.*;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.junit.Test;
import org.wltea.analyzer.lucene.IKAnalyzer;

import java.io.File;

/**
 * @author HackerStar
 * @create 2020-05-13 20:52
 */
public class LuceneSecond {
    @Test
    public void addDocument() throws Exception {
        //绱㈠紩搴撳瓨鏀捐矾寰?        Directory directory = FSDirectory.open(new File("/Users/xxx/Development/Lucene/index").toPath());
        IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer());
        //鍒涘缓涓€涓猧ndexwriter瀵硅薄
        IndexWriter indexWriter = new IndexWriter(directory, config);
        //鍒涘缓涓€涓狣ocument瀵硅薄
        Document document = new Document();
        //鍚慸ocument瀵硅薄涓坊鍔犲煙
        //涓嶅悓鐨刣ocument鍙互鏈変笉鍚岀殑鍩燂紝鍚屼竴涓猟ocument鍙互鏈夌浉鍚岀殑鍩?        document.add(new TextField("fileName", "鏂版坊鍔犵殑鏂囨。", Field.Store.YES));
        document.add(new TextField("content", "鏂版坊鍔犵殑鏂囨。鐨勫唴瀹?quot;, Field.Store.NO));
        //LongPoint鍒涘缓绱㈠紩
        document.add(new LongPoint("size", 10000));
        //StoredField瀛樺偍鏁版嵁
        document.add(new StoredField("size", 10000));
        //涓嶉渶瑕佸垱寤虹储寮曠殑灏变娇鐢⊿toredField瀛樺偍
        document.add(new StoredField("path", "/Users/XinxingWang/Development/Lucene/index/test.txt"));
        //娣诲姞鏂囨。鍒扮储寮曞簱
        indexWriter.addDocument(document);
        //鍏抽棴indexwriter
        indexWriter.close();
    }
}

7.2 绱㈠紩搴撶殑鍒犻櫎

7.2.1 鍒犻櫎鍏ㄩ儴
@Test
    public void deleteAllIndex() throws Exception {
        IndexWriter indexWriter = getIndexWriter();
        //鍒犻櫎鍏ㄩ儴绱㈠紩
        indexWriter.deleteAll();
        //鍏抽棴indexWriter
        indexWriter.close();
    }

    public IndexWriter getIndexWriter() throws Exception{
        //绱㈠紩搴撳瓨鏀捐矾寰?        Directory directory = FSDirectory.open(new File("/Users/xxx/Development/Lucene/index").toPath());
        IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer());
        //鍒涘缓涓€涓猧ndexwriter瀵硅薄
        IndexWriter indexWriter = new IndexWriter(directory, config);
        return indexWriter;
    }

璇存槑锛氬皢绱㈠紩鐩綍鐨勭储寮曚俊鎭叏閮ㄥ垹闄わ紝鐩存帴褰诲簳鍒犻櫎锛屾棤娉曟仮澶嶃€?/p>

姝ゆ柟娉曟厧鐢紒锛?/strong>

7.2.2 鎸囧畾鏌ヨ鏉′欢鍒犻櫎
    @Test
    public void deleteByQuery() throws  Exception {
        IndexWriter indexWriter = getIndexWriter();
        //鍒涘缓涓€涓煡璇㈡潯浠?        Query query = new TermQuery(new Term("fileName", "apache"));
        //鏍规嵁鏌ヨ鏉′欢鍒犻櫎
        indexWriter.deleteDocuments(query);
        //鍏抽棴indexwriter
        indexWriter.close();
    }

7.3 绱㈠紩搴撶殑淇敼

鍘熺悊灏辨槸鍏堝垹闄ゅ悗娣诲姞

@Test
    public void updateIndex() throws Exception {
        IndexWriter indexWriter = getIndexWriter();
        //鍒涘缓涓€涓狣ocument瀵硅薄
        Document document = new Document();
        //鍚慸ocument瀵硅薄涓坊鍔犲煙
        document.add(new TextField("fileName", "瑕佹洿鏂扮殑鏂囨。", Field.Store.YES));
        document.add(new TextField("fileContent", "Lucene 绠€浠?Lucene 鏄竴涓熀浜?Java 鐨勫叏鏂囦俊鎭绱㈠伐鍏峰寘, 瀹冧笉鏄竴涓畬鏁寸殑鎼滅储搴旂敤绋嬪簭,鑰屾槸涓轰綘鐨勫簲鐢ㄧ▼搴忔彁渚涚储寮曞拰鎼滅储鍔熻兘銆?quot;
                ,Field.Store.YES));
        indexWriter.updateDocument(new Term("fileContent", "java"), document);
        //鍏抽棴indexWriter
        indexWriter.close();
    }

8.1 TermQuery

TermQuery锛岄€氳繃椤规煡璇紝TermQuery涓嶄娇鐢ㄥ垎鏋愬櫒锛屾墍浠ュ缓璁尮閰嶄笉鍒嗚瘝鐨凢ield鍩熸煡璇紝姣斿璁㈠崟鍙枫€佸垎绫籌D鍙风瓑銆?/p>

鎸囧畾瑕佹煡璇㈢殑鍩熷拰瑕佹煡璇㈢殑鍏抽敭璇嶃€?/p>

package org.example.lucene;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.*;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.junit.Test;

import java.io.File;

/**
 * @author HackerStar
 * @create 2020-05-13 21:56
 */
public class LuceneThird {
    @Test
    public void testTermQuery() throws Exception {
        Directory directory = FSDirectory.open(new File("/Users/XinxingWang/Development/Lucene/index").toPath());
        IndexReader indexReader = DirectoryReader.open(directory);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);

        //鍒涘缓鏌ヨ瀵硅薄
        Query query = new TermQuery(new Term("fileContent", "lucene"));
        //鎵ц鏌ヨ
        TopDocs topDocs = indexSearcher.search(query, 10);
        //鍏辨煡璇㈠埌鐨刣ocument涓暟
        System.out.println("鏌ヨ缁撴灉鎬绘暟閲忥細" + topDocs.totalHits);
        //閬嶅巻鏌ヨ缁撴灉
        for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
            Document document = indexSearcher.doc(scoreDoc.doc);
            System.out.println(document.get("fileName"));
            //System.out.println(document.get("fileContent"));
            System.out.println(document.get("filePath"));
            System.out.println(document.get("fileSize"));
        }
        //鍏抽棴indexreader
        indexSearcher.getIndexReader().close();
    }
}

public IndexSearcher getIndexSearcher() throws Exception{ Directory directory = FSDirectory.open(new File("/Users/XinxingWang/Development/Lucene/index").toPath()); IndexReader indexReader = DirectoryReader.open(directory); IndexSearcher indexSearcher = new IndexSearcher(indexReader); return indexSearcher; } public void printResult(Query query, IndexSearcher indexSearcher) throws Exception { //鎵ц鏌ヨ TopDocs topDocs = indexSearcher.search(query, 10); //鍏辨煡璇㈠埌鐨刣ocument涓暟 System.out.println("鏌ヨ缁撴灉鎬绘暟閲忥細" + topDocs.totalHits); //閬嶅巻鏌ヨ缁撴灉 for (ScoreDoc scoreDoc : topDocs.scoreDocs) { Document document = indexSearcher.doc(scoreDoc.doc); System.out.println(document.get("fileName")); System.out.println(document.get("fileContent")); System.out.println(document.get("filePath")); System.out.println(document.get("fileSize")); System.out.println("-----------------------------------"); } //鍏抽棴indexreader indexSearcher.getIndexReader().close(); } @Test public void testRangeQuery() throws Exception { IndexSearcher indexSearcher = getIndexSearcher(); Query query = LongPoint.newRangeQuery("fileSize", 0l, 700l); printResult(query, indexSearcher); }

娌℃湁鏁堟灉锛屼笉鐭ラ亾涓轰粈涔堬紵锛燂紵

8.3 浣跨敤queryparser鏌ヨ

閫氳繃QueryParser涔熷彲浠ュ垱寤篞uery锛孮ueryParser鎻愪緵涓€涓狿arse鏂规硶锛屾鏂规硶鍙互鐩存帴鏍规嵁鏌ヨ璇硶鏉ユ煡璇€俀uery瀵硅薄鎵ц鐨勬煡璇㈣娉曞彲閫氳繃System.out.println(query);鏌ヨ銆?/p>

闇€瑕佷娇鐢ㄥ埌鍒嗘瀽鍣ㄣ€傚缓璁垱寤虹储寮曟椂浣跨敤鐨勫垎鏋愬櫒鍜屾煡璇㈢储寮曟椂浣跨敤鐨勫垎鏋愬櫒瑕佷竴鑷淬€?/p>

闇€瑕佸姞鍏ueryParser渚濊禆鐨刯ar鍖呫€?/p>

/lucene-7.4.0/queryparser/lucene-queryparser-7.4.0.jar

鍖呯粨鏋?

鎶€鏈浘鐗? src=

@Test
    public void testQueryParser() throws Exception {
        IndexSearcher indexSearcher = getIndexSearcher();
        //鍒涘缓queryparser瀵硅薄
        //绗竴涓弬鏁伴粯璁ゆ悳绱㈢殑鍩?        //绗簩涓弬鏁板氨鏄垎鏋愬櫒瀵硅薄
        QueryParser queryParser = new QueryParser("fileContent", new IKAnalyzer());
        Query query = queryParser.parse("Lucene鏄痡ava寮€鍙戠殑");
     		System.out.println(query);
        //鎵ц鏌ヨ
        printResult(query, indexSearcher);
    }

以上是关于MacOS涓婰ucene瀛︿範的主要内容,如果未能解决你的问题,请参考以下文章

python瀛︿範鍩虹涓€

python瀛︿範--bisect妯″潡

docker瀛︿範涔嬭矾

瀛︿範Java鐨勭浜屽ぉ

python瀛︿範绡?5 - 瀛楀吀 闆嗗悎

Python瀛︿範涔嬭矾鈥?018/7/11