涓枃鍒嗚瘝妯″瀷
Posted 鏁板缂栫▼
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了涓枃鍒嗚瘝妯″瀷相关的知识,希望对你有一定的参考价值。
鏈€杩戜竴娈垫椂闂存病鏈夋洿鏂版枃绔犱簡銆傚繖鐫€鍦ㄥ甯﹀瀛愬拰鎵惧伐浣滆繖涓や欢浜嬶紝鏈夌偣鐒﹀ご鐑傞浜嗮煠紝蹇欏緱宸笉澶氫簡锛岃繕鏄户缁啓鐐逛笢瑗裤€傛湰鏂囧皢鏋勫缓涓€涓缁忕綉缁滄ā鍨嬪疄鐜颁腑鏂囩殑鍒嗚瘝銆?/p>
浠嬬粛
涓枃鍒嗚瘝浠诲姟灏辨槸瑙e喅杩欐牱鐨勪换鍔★紝姣斿锛?/p>
鎴戞瘯涓氫簬闄曡タ甯堣寖澶у鐗╃悊绯?br>
鍒嗚瘝鐨勭粨鏋滀负:
鎴?姣曚笟 浜?闄曡タ甯堣寖澶у 鐗╃悊绯?br>
椤惧悕鎬濅箟锛屽垎璇嶅氨鏄妸涓€娈垫枃鏈腑鐨勮瘝璇垎鍓插紑锛岀湅璧锋潵鏄笉鏄緢绠€鍗曞憿銆傞『鍏惰嚜鐒剁殑鎬濊矾鏄畾涔変竴涓?span class="mq-11">姹夎璇嶅吀锛屾瘡娆″垎璇嶇殑鏃跺€欓兘鍘绘煡璇嶅吀锛岃繖鏍峰氨鑳藉畬鎴愬垎璇嶄簡銆?/p>
濡傛灉浣犺兘鎯冲埌杩欓噷锛屾伃鍠滀綘锛屽浜庣櫥褰曡瘝鍩烘湰閮芥槸杩欎箞鍋氱殑锛屼笉杩囧緢澶氭椂鍊欒瘝浼氬寘鍚叧绯汇€傛瘮濡傝鈥滈檿瑗垮笀鑼冨ぇ瀛︹€濇槸涓€涓瘝锛岃€屸€滈檿瑗库€濅篃鏄篃涓€涓瘝锛岃繖绉嶆儏鍐垫€庝箞鍔炲憿锛熸妧鏈笂鏄皢璇嶅吀鏋勫缓鎴?span class="mq-13">鍓嶇紑鏍?/strong>锛屾瘡娆℃壂鎻忔枃鏈殑涓嬩竴涓瘝锛岀湅鏄惁鑳藉湪鍓嶇紑鏍戜腑鎵惧埌锛屽墠缂€鏍戦暱杩欎釜鏍峰瓙銆?/p>
姣忎竴涓瓧宓屽涓€灞侻ap锛圥ython涓О涓哄瓧鍏革級锛屽綋鈥滆タ鈥濆悗闈㈠彧鏈夋帴鈥滃笀鈥濇墠鑳界户缁線涓嬶紝鍚﹀垯寰楀埌None锛屽氨杩斿洖鈥滈檿瑗库€濊繖涓瘝浜嗐€傚鏋滄妸甯哥敤鐨勮瘝閮藉缓鎴愯繖鏍风殑鍓嶇紑鏍戯紝鏄彲浠ユ弧瓒冲熀鏈殑鍒嗚瘝浠诲姟鐨勶紝涓嶈繃杩欑鎯呭喌瀵逛簬璇嶅吀娌℃湁瀹氫箟鐨勮瘝锛屼細寰楀埌寰堝鎬殑鍒嗚瘝缁撴灉锛屽浜庢湭鐧诲綍璇嶏紝閫氬父閲囩敤闅愰┈灏斿彲澶ā鍨嬶紙鍙互鍙傝€冪粨宸村垎璇嶏級锛屼箣鍓嶇殑鏂囩珷鏈変粙缁嶈繖涓ā鍨嬶紝杩欓噷涓嶅湪璧樿堪銆傛帴涓嬫潵鐪嬬缁忕綉缁滃浣曡В鍐宠繖涓棶棰樸€?/p>
涓枃鍒嗚瘝浠诲姟
涓枃鍒嗚瘝浠诲姟閫氬父琚畾涔変负搴忓垪鏍囨敞闂锛岃緭鍏ユ枃鏈瓑浜庢爣绛剧殑闀垮害涓€鑷达紝涔熷氨鎰忓懗鐫€閫愬瓧鏍囪銆傞€氬父鏍囩鐨勫畾涔夋湁2Tag銆?Tag銆?Tag锛屾渶甯哥敤鐨勬槸4Tag鐨勬爣绛撅紝鍒嗗埆鏄疊銆丮銆丒銆丼 鍒嗚〃琛ㄧず璇嶇殑寮€濮嬨€佷腑闂淬€佺粨鏉熴€佸崟涓瘝銆備緥濡傦細
鎴戞瘯涓氫簬闄曡タ甯堣寖澶у鐗╃悊绯?br>SBESBMMMMEBME
閫氳繃瑙h鏍囩锛屽氨鑳藉緱鍒板垎璇嶇粨鏋溿€傗€淪/BE/S/BMMMME/BME鈥濓紝瀵逛簬鐨勫垎璇嶆槸鈥滄垜/姣曚笟/浜?闄曡タ甯堣寖澶у/鐗╃悊绯烩€濄€?/p>
鏋勫缓妯″瀷
瀵逛簬杈撳叆搴忓垪X锛堝氨鏄緟鍒嗚瘝鐨勬枃鏈級锛岃緭鍑轰负鏍囪搴忓垪锛堟爣绛撅級銆傛湰璐ㄤ笂鏄眰 鏉′欢姒傜巼锛屼娇寰楄繖涓潯浠舵鐜囨渶澶ф椂瀵瑰簲鐨勮緭鍑哄簭鍒梇銆?/p>
濡傛灉姣忎釜鏍囩鏄嫭绔嬶紝涔熷氨鏄
涓?span role="presentation" data-formula="y_j" data-formula-type="inline-equation" class="mq-30">
鏄棤鍏崇殑锛岃繖涓潯浠舵鐜囧氨鍙互鍐欐垚锛?/p>
瀵瑰簲鐨勭粨鏋勮涓嬪浘锛岃繖灏辨槸softmax瑙g爜锛屾湰璐ㄤ笂鏄氨鏄€愯瘝鐨勫垎绫婚棶棰?Token Classification)銆?/p>
涓婇潰鐨勬ā鍨嬪拷鐣ヤ簡涓€涓棶棰橈紝鏍囩骞朵笉鏄嫭绔嬪垎甯冪殑锛岃€屾槸渚濊禆涓婁笅鏂囨爣绛剧殑銆傛瘮濡傚緟鍒嗚瘝鐨勬枃鏈紝绗竴涓瓧鍙湁鍙兘鏄疭鎴栬€匓鏍囩锛屼笉鍙兘鏄疢鎴朎鏍囩锛孊鏍囩涔嬪悗鍙兘鏄疢鎴栬€匛鏍囩锛屼笉鑳芥槸S鎴栬€匓鏍囩銆備篃灏辨槸璇存爣绛句箣闂村瓨鍦ㄨ浆绉绘鐜囷紝鎴戜滑鎶婅繖涓洜绱犺€冭檻杩涘幓銆?/p>
鎬荤粨涓€涓嬩笂闈㈢殑鍐呭锛屾爣绛炬瀯鎴愮嚎鎬ч摼锛屽苟涓旀湇浠庨┈灏斿彲澶€э紝浜庢槸杩欎釜姒傜巼鍥惧氨绠€鍖栦负涓€涓嚎鎬х殑鏉′欢闅忔満鍦恒€傚搴旂殑姒傜巼鍥惧涓嬶細
鍏充簬鏉′欢闅忔満鍦虹殑鍐呭锛屼笅娆″啀璇︾粏浠嬬粛锛屽唴瀹逛細娑夊強澶ч噺鍏紡锛屾劅鍏磋叮鐨勮鑰呭彲浠ュ幓浜嗚В銆?/p>
涓嬮潰缁欏嚭keras鏋勫缓妯″瀷鐨勪唬鐮侊細
from tensorflow.keras import Input, Model, Sequential
from tensorflow.keras import layers as L
CRF = ConditionalRandomField()
vocab_size = len(tokenizer.token2id)
cnn = Sequential([
L.Embedding(vocab_size + 1, 100),
L.Conv1D(filters=128, kernel_size=3, padding='same', activation='relu'),
L.Conv1D(filters=128, kernel_size=3, padding='same', activation='relu'),
L.Conv1D(filters=128, kernel_size=3, padding='same', activation='relu'),
L.Dense(5, activation='softmax'),
# ConditionalRandomField()
], name='cnn')
cnn.summary()
x_in = Input(shape=(None,))
x = cnn(x_in)
out = CRF(x)
model = Model(x_in, out)
model.summary()
瑙i噴涓€涓嬭繖閲屼娇鐢?Tag锛岀粨鏋滄槸杈撳嚭5鍒嗙被杩欎釜闂锛岃繖鏄洜涓哄鏍囩搴忓垪闇€瑕佸仛padding锛屼互淇濊瘉姣忎釜batch鐨剆hape鏄浉鍚岀殑銆?/p>
浣跨敤浜咰RF灞傦紝妯″瀷棰勬祴鐨勬椂鍊欓渶瑕佷娇鐢?span class="mq-69">缁寸壒姣旇В鐮?/strong>锛岃繖閲屼娇鐢ㄨ嫃鍓戞灄鐨刡ert4keras妯″潡锛屾劅璋綔鑰呯殑寮€婧愰」鐩€?/p>
from bert4keras.snippets import ViterbiDecoder
from typing import Union
class WordCutDecoder(ViterbiDecoder):
def recognize(self, text: Union[str, List[str]]):
"""text 鏄痗har绾у埆鐨刲ist鎴栬€卻tr"""
encode_text = [tokenizer.token2id.get(w, 0) for w in text]
nodes = model.predict([encode_text])[0]
labels = self.decode(nodes=nodes[:, :-1])
tags = [id2tag.get(i) for i in labels]
return labels, tags
decoder = WordCutDecoder(model.get_weights()[-1][:-1, :-1], starts=[0,3], ends=[2,3])
妯″瀷棰勬祴鐨勬椂鍊欎娇鐢ㄧ淮鐗规瘮瑙g爜锛屽緱鍒版爣绛撅紝瑙f瀽鏍囩灏辫兘寰楀嚭鏈€缁堢殑鍒嗚瘝缁撴灉锛屾垜鍐欎簡涓€涓?span class="mq-101">simple_cut鏂规硶渚涘弬鑰冿紝涓嬮潰鏄嚑涓垎璇嶇殑缁撴灉锛?/p>
" ".join(simple_cut('鎴戞湰绉戞瘯涓氫簬闄曡タ甯堣寖澶у锛屾瘯涓氫互鍚庝竴鐩村湪娣卞湷浠庝簨浜掕仈缃戝伐浣?))
# '鎴?鏈 姣曚笟 浜?闄曡タ甯堣寖澶у 锛?姣曚笟 浠ュ悗 涓€鐩?鍦?娣卞湷 浠庝簨 浜掕仈缃?宸ヤ綔'
" ".join(simple_cut('鍩轰簬娣卞害瀛︿範鐨勪俊鎭娊鍙栨妧鏈泦鏁e湴,娆㈣繋澶у鍏虫敞'))
# 鍩轰簬 娣卞害 瀛︿範 鐨?淇℃伅 鎶藉彇 鎶€鏈?闆嗘暎鍦?, 娆㈣繋 澶у 鍏虫敞
" ".join(simple_cut('涓嶇煡閬撹繖涓帺鎰忓埌搴?鎬庝箞鏍?'))
# 涓?鐭ラ亾 杩欎釜 鐜╂剰 鍒板簳 , 鎬庝箞鏍??
璇︾粏鐨勪唬鐮佸彲浠ュ湪鎴戠殑GitHub涓婄湅鍒般€?/p>
以上是关于涓枃鍒嗚瘝妯″瀷的主要内容,如果未能解决你的问题,请参考以下文章