鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

Posted 鏈哄櫒瀛︿範AI绠楁硶宸ョ▼

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)相关的知识,希望对你有一定的参考价值。


鍚慉I杞瀷鐨勭▼搴忓憳閮藉叧娉ㄤ簡杩欎釜鍙?/span>馃憞馃憞馃憞



鐩稿叧鏂囩珷





澶ц禌瀹樼綉锛歨ttp://meizu.baiducloud.top/ps/web/index.html
鍒濊禌鍐呭锛氫粠鍥剧墖涓瘑鍒洓鍒欒繍绠楀紡锛岀畻寮忓彲鑳藉寘鍚暟瀛?~9銆佽繍绠楃+-*銆佹嫭鍙?)銆傚苟涓旓紝绠楀紡鐨勯暱搴﹀浐瀹氫负5鎴?锛屽寘鍚笁涓暟瀛楋紝涓や釜杩愮畻绗︼紝0鎴?瀵规嫭鍙枫€備笅闈㈡槸鍑犱釜鏍蜂緥锛?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

(4*8)+8




鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

(0-2)+5



鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

2*8-7


瑕佹眰鍙傝禌鑰呯粰鍑烘瘡寮犲浘鐗囦腑鐨勭畻寮忓拰杩愮畻缁撴灉銆?/p>

璁粌闆嗗叡100,000寮犲浘鐗囷紝骞堕檮甯︽爣绛俱€傛祴璇曢泦鍏?00,000寮犲浘鐗囷紝鏃犳爣绛撅紝棰勬祴缁撴灉涓婁紶鍚庤绠楁纭巼锛屼綔涓哄垵璧涚殑鎺掑悕銆?/p>



闂鎻忚堪

鏈绔炶禌鐩殑鏄负浜嗚В鍐充竴涓?OCR 闂锛岄€氫織鍦拌灏辨槸瀹炵幇鍥惧儚鍒版枃瀛楃殑杞崲杩囩▼銆?/p>

鏁版嵁闆?/span>

鍒濊禌鏁版嵁闆嗕竴鍏卞寘鍚?0涓囧紶180*60鐨勫浘鐗囧拰涓€涓猯abels.txt鐨勬枃鏈枃浠躲€傛瘡寮犲浘鐗囧寘鍚竴涓暟瀛﹁繍绠楀紡锛岃繍绠楀紡鍖呭惈锛?/p>

3涓繍绠楁暟锛?涓?鍒?鐨勬暣鍨嬫暟瀛楋紱 2涓繍绠楃锛氬彲浠ユ槸+銆?銆?锛屽垎鍒唬琛ㄥ姞娉曘€佸噺娉曘€佷箻娉?0鎴?瀵规嫭鍙凤細鎷彿鍙兘鏄?瀵规垨鑰?瀵?/p>

鍥剧墖鐨勫悕绉颁粠0.png鍒?9999.png锛屼笅闈㈡槸涓€浜涙牱渚嬪浘鐗囷紙杩欓噷鍙彇浜嗕竴寮狅級锛?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鏂囨湰鏂囦欢 labels.txt 鍖呭惈10w琛屾枃鏈紝姣忚鏂囨湰鍖呭惈姣忓紶鍥剧墖瀵瑰簲鐨勫叕寮忎互鍙婂叕寮忕殑璁$畻缁撴灉锛屽叕寮忓拰璁$畻缁撴灉涔嬮棿绌烘牸鍒嗗紑锛屼緥濡傚浘鐗囦腑鐨勭ず渚嬪浘鐗囧搴旂殑鏂囨湰濡備笅鎵€绀猴細

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)


璇勪环鎸囨爣

瀹樻柟鐨勮瘎浠锋寚鏍囨槸鍑嗙‘鐜囷紝鍒濊禌鍙湁鏁存暟鐨勫姞鍑忎箻杩愮畻锛屾墍寰楃殑缁撴灉涓€瀹氭槸鏁存暟锛屾墍浠ヨ姹傚簭鍒椾笌杩愮畻缁撴灉閮芥纭墠浼氬垽瀹氫负姝g‘銆?/p>

鎴戜滑鏈湴闄や簡浼氫娇鐢ㄥ畼鏂圭殑鍑嗙‘鐜囦綔涓鸿瘎浼版爣鍑嗕互澶栵紝杩樹細浣跨敤 CTC loss 鏉ヨ瘎浼版ā鍨嬨€?/p>

浣跨敤 captcha 杩涜鏁版嵁澧炲己

瀹樻柟鎻愪緵浜?0涓囧紶鍥剧墖锛屾垜浠彲浠ョ洿鎺ヤ娇鐢ㄥ畼鏂规暟鎹繘琛岃缁冿紝涔熷彲浠ラ€氳繃Captcha锛屽弬鐓у畼鏂硅缁冮泦锛岄殢鏈虹敓鎴愭洿澶氭暟鎹紝杩涜€屾彁楂樺噯纭€с€傛牴鎹鐩姹傦紝label 蹇呭畾鏄笁涓暟瀛楋紝涓や釜杩愮畻绗︼紝涓€瀵规垨娌℃湁鎷彿锛屾牴鎹嫭鍙疯鍒欙紝鍙湁鍙兘鏄病鎷彿锛屽乏鎷彿鍜屽彸鎷彿锛屽洜姝ゅ緢瀹规槗灏卞彲浠ュ啓鍑烘暟鎹敓鎴愬櫒鐨勪唬鐮併€?/p>

鐢熸垚鍣?/strong>

鐢熸垚鍣ㄧ殑鐢熸垚瑙勫垯寰堢畝鍗曪細

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鐩镐俊澶у閮借兘鐪嬫噦銆傚綋鐒讹紝鎴戝啓鏂囩珷鐨勬椂鍊欏張鎯冲埌涓€绉嶆洿濂界殑鍐欐硶锛?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

闄や簡鐢熸垚绠楀紡浠ュ锛岃繕鏈変竴涓€煎緱娉ㄦ剰鐨勫湴鏂瑰氨鏄垵璧涙墍鏈夌殑鍑忓彿锛堜篃灏辨槸鈥?鈥濓級閮芥槸缁嗙殑锛屼絾鏄垜浠洿鎺ョ敤 captcha 搴撶敓鎴愬浘鍍忎細寰楀埌绮楃殑鍑忓彿锛屾墍浠ユ垜浠慨鏀逛簡 image.py 涓殑浠g爜锛屽湪 _draw_character 鍑芥暟涓垜浠鍔犱簡涓€鍙ュ垽鏂紝濡傛灉鏄噺鍙凤紝鎴戜滑灏变笉杩涜 resize 鎿嶄綔锛岃繖鏍峰氨鑳介槻姝㈠噺鍙峰彉绮楋細

if c != '-':
    im = im.resize((w2, h2))
    im = im.transform((w, h), Image.QUAD, data)

鎴戜滑缁ц€屼娇鐢ㄧ敓鎴愬櫒鐢熸垚鍥涘垯杩愮畻楠岃瘉鐮侊細

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

涓婂浘灏辨槸鍘熺増鐢熸垚鍣ㄧ敓鎴愮殑鍥撅紝鎴戜滑鍙互鐪嬪埌鍑忓彿鏄緢绮楃殑銆?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

涓婂浘鏄慨鏀硅繃鐨勭敓鎴愬櫒锛屽彲浠ョ湅鍒板噺鍙峰凡缁忎笉绮椾簡銆?/p>

妯″瀷缁撴瀯 

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)


妯″瀷缁撴瀯鍍忎箣鍓嶅啓鐨勬枃绔犱竴鏍凤紝鍙槸鎶婂嵎绉牳鐨勪釜鏁版敼澶氫簡涓€鐐癸紝鍔犱簡涓€浜?BN 灞傦紝骞朵笖鍦ㄥ洓鍗′笂鍋氫簡涓€鐐瑰皬鏀瑰姩浠ユ敮鎸佸GPU璁粌銆傚鏋滀綘鏄崟鍗★紝鍙互鐩存帴鍘绘帀 base_model2 = make_parallel(base_model, 4) 鐨勪唬鐮併€?/p>

BN 灞備富瑕佹槸涓轰簡璁粌鍔犻€燂紝瀹為獙缁撴灉闈炲父濂斤紝妯″瀷鏀舵暃蹇簡寰堝銆?/p>

base_model 鐨勫彲瑙嗗寲锛?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

model 鐨勫彲瑙嗗寲锛?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

妯″瀷璁粌

鍦ㄧ粡杩囧嚑娆℃祴璇曚互鍚庯紝鎴戝凡缁忔姏寮冧簡 evaluate 鍑芥暟锛屽洜涓哄湪楠岃瘉闆嗕笂宸茬粡鑳藉仛鍒?100% 璇嗗埆鐜囦簡锛屾墍浠ュ彧闇€瑕佺湅 val_loss 灏卞彲浠ヤ簡銆傚湪缁忚繃涔嬪墠鐨勫嚑娆″皾璇曚互鍚庯紝鎴戝彂鐜板湪鏈夌敓鎴愬櫒鐨勬儏鍐典笅锛岃缁冧唬鏁拌秺澶氳秺濂斤紝鍥犳鐩存帴鐢?adam 璺戜簡50浠o紝姣忎唬10涓囨牱鏈紝鍙互鐪嬪埌妯″瀷鍦?0浠d互鍚庡熀鏈凡缁忔敹鏁涖€?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鎴戜滑鍙互鐪嬪埌妯″瀷鍏堝垎涓哄洓浠斤紝鍦ㄥ洓涓樉鍗′笂骞惰璁$畻锛岀劧鍚庡悎骞剁粨鏋滐紝璁$畻鏈€鍚庣殑 ctc loss锛岃繘鑰岃缁冩ā鍨嬨€?/p>

缁撴灉鍙鍖?/p>

杩欓噷鎴戜滑瀵圭敓鎴愮殑鏁版嵁杩涜浜嗗彲瑙嗗寲锛屽彲浠ョ湅鍒版ā鍨嬪熀鏈凡缁忓仛鍒颁竾鏃犱竴澶憋紝鐧惧彂鐧句腑銆?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鎵撳寘鎴?docker 浠ュ悗鎻愪氦鍒版瘮璧涚郴缁熶腑锛岀粡杩囧崄鍑犲垎閽熺殑杩愯锛屾垜浠緱鍒颁簡瀹岀編鐨?鍒嗐€?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鎬荤粨

鍒濊禌鏄潪甯哥畝鍗曠殑锛屽洜姝ゆ垜浠墠鑳藉緱鍒拌繖涔堝噯鐨勫垎鏁帮紝涔嬪悗瀹樻柟杩涗竴姝ユ彁鍗囦簡闅惧害锛屽皢鍒濊禌娴嬭瘯闆嗘彁楂樺埌浜?0涓囧紶锛屽湪杩欎釜闆嗕笂鎴戜滑鐨勬ā鍨嬪彧鑳芥嬁鍒?.999925鐨勬垚缁╋紝鍙鐨勬敼杩涙柟娉曟槸灏嗗噯纭巼杩涗竴姝ラ檷浣庯紝鍏呭垎璁粌妯″瀷锛屽皢澶氫釜妯″瀷缁撴灉铻嶅悎绛夈€?/p>

瀹樻柟鎵╁厖娴嬭瘯闆嗙殑闅剧偣

鍦ㄦ墿鍏呮暟鎹泦涓婏紝鎴戜滑鍙戠幇鏈変竴浜涘浘鐗囬娴嬪嚭鏉ユ棤娉曡绠楋紝姣斿 [629,2271,6579,17416,71857,77631,95303,102187,117422,142660,183693] 绛夛紝杩欓噷鎴戜滑鍙?117422.png 涓轰緥銆?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鎴戜滑鍙互鐪嬪埌鑲夌溂鍩烘湰鏃犳硶璁ゅ嚭杩欎釜鍥撅紝浣嗘槸缁忚繃涓€瀹氱殑鍥惧儚澶勭悊锛屾垜浠彲浠ユ樉鐜板嚭鏉ュ畠鐨勭湡瀹為潰璨岋細

IMAGE_DIR='image_contest_level_1_validate'index =117422img =cv2.imread( '%s/%d.png'%IMAGE_DIR, index))gray =cv2.cvtColor(img, cv2. COLOR_BGR2GRAY)h =cv2.equalizeHist(gray)

鐒跺悗鎴戜滑鍙互鐪嬪埌杩欐牱鐨勭粨鏋滐細

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

褰撶劧锛岃繕鏈変竴寮犲浘鏄棤娉曢€氳繃棰勫鐞嗗緱鍒扮粨鏋滅殑锛?42660锛岃繖鏈夊彲鑳芥槸绋嬪簭鐨?bug 閫犳垚鐨勫皬姒傜巼浜嬩欢锛屾墍浠ュ垵璧涢櫎浜嗘垜浠窇浜嗕竴涓?docker 寰楀埌婊″垎浠ュ锛屾病鏈夌浜屼釜浜鸿揪鍒版弧鍒嗐€?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)


鍐宠禌鏃剁殑鎵€鏈夋€濊矾

鏁版嵁闆?/h3>

鍐宠禌鏁版嵁闆嗕竴鍏卞寘鍚?0涓囧紶鍥剧墖鍜屼竴涓猯abels.txt鐨勬枃鏈枃浠躲€傛瘡寮犲浘鐗囧寘鍚竴涓暟瀛﹁繍绠楀紡锛岃繍绠楀紡涓寘鍚細

  1. 鍥剧墖澶у皬涓嶅浐瀹?/p>

  2. 鍥剧墖涓殑鏌愪竴鍧楀尯鍩熶负鍏紡閮ㄥ垎

  3. 鍥剧墖涓寘鍚簩琛屾垨鑰呬笁琛岀殑鍏紡

  4. 鍏紡绫诲瀷鏈変袱绉嶏細璧嬪€煎拰鍥涘垯杩愮畻鐨勫叕寮忋€備袱琛岀殑鍖呮嫭鐢变竴涓祴鍊煎叕寮忓拰涓€涓绠楀叕寮忥紝涓夎鐨勫寘鎷袱涓祴鍊煎叕寮忓拰涓€涓绠楀叕寮忋€傚姞鍙凤紙+锛?鍗充娇鏃嬭浆涓?x 锛屼粛涓哄姞鍙凤紝 * 鏄箻鍙?/p>

  5. 璧嬪€肩被鐨勫叕寮忥紝鍙橀噺鍚嶄负涓€涓眽瀛椼€?姹夊瓧鏉ヨ嚜涓ゅ彞璇楋紙涓嶅寘鎷€楀彿锛夛細 鍚涗笉瑙侊紝榛勬渤涔嬫按澶╀笂鏉ワ紝濂旀祦鍒版捣涓嶅鍥?鐑熼攣姹犲鏌筹紝娣卞湷閾佹澘鐑?/p>

  6. 鍥涘垯杩愮畻鐨勫叕寮忓寘鎷姞娉曘€佸噺娉曘€佷箻娉曘€佸垎鏁般€佹嫭鍙枫€?鍏朵腑鐨勬暟瀛椾负澶氫綅鏁板瓧锛屾眽瀛椾负鍙橀噺锛岀敱涓婇潰鐨勮鍙ヨ祴鍊笺€?/p>

  7. 杈撳嚭缁撴灉鐨勬牸寮忎负锛氬浘鐗囦腑鐨勫叕寮忥紝涓€涓嫳鏂囩┖鏍硷紝璁$畻缁撴灉銆?鍏朵腑锛?涓嶅悓琛屽叕寮忎箣闂翠娇鐢ㄨ嫳鏂囧垎鍙峰垎闅?璁$畻缁撴灉鏃讹紝鍒嗘暟鎸夌収娴偣鏁拌绠楋紝璁$畻缁撴灉璇樊涓嶈秴杩?.01锛岃涓烘纭€?/p>

  8. 鏁翠釜label鏂囦欢浣跨敤UTF8缂栫爜


鍐宠禌鏍蜂緥锛?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鍒濊禌鐨勯涓嶉毦锛屽彧闇€瑕佽瘑鍒枃鏈簭鍒楀嵆鍙紝鍐宠禌鐨勭畻寮忔瘮杈冨鏉傦紝闇€瑕佸厛缁忚繃鍥惧儚澶勭悊锛岀劧鍚庢墠鑳借緭鍏ュ埌绁炵粡缃戠粶涓繘琛岀鍒扮鐨勬枃鏈簭鍒楄瘑鍒€?/p>

璇勪环鎸囨爣

瀹樻柟鐨勮瘎浠锋寚鏍囨槸鍑嗙‘鐜囷紝鍒濊禌鍙湁鏁存暟鐨勫姞鍑忎箻杩愮畻锛屾墍寰楃殑缁撴灉涓€瀹氭槸鏁存暟锛屾墍浠ヨ姹傚簭鍒椾笌杩愮畻缁撴灉閮芥纭墠浼氬垽瀹氫负姝g‘銆?/p>

浣嗗喅璧涚殑鏁板瓧閫氬父閮芥槸浜斾綅鏁帮紝骞朵笖浼氭湁寰堝涔樻硶鍜屽姞娉曪紝浠ュ強涓€瀹氫細瀛樺湪鐨勪竴涓垎鏁帮紝鎵€浠ョ粨鏋滃緢瀹规槗瓒呭嚭64浣嶆诞鐐规暟鎵€鑳借〃绀虹殑鑼冨洿锛屽洜姝ゅ畼鏂瑰湪缁忚繃璁ㄨ鍚庡喅瀹氬彧鑰冭檻鏂囨湰搴忓垪鐨勮瘑鍒紝涓嶈瘎浠疯繍绠楃粨鏋溿€?/p>

鑰屾垜浠湰鍦伴櫎浜嗕細浣跨敤瀹樻柟鐨勫噯纭巼浣滀负璇勪及鏍囧噯浠ュ锛岃繕浼氫娇鐢?CTC loss 鏉ヨ瘎浼版ā鍨嬨€?/p>

鏁版嵁鐨勬帰绱?/h2>

瀹氫箟

鍐宠禌鐨勬暟鎹泦鎺㈢储灏卞鏉傚緱澶氾紝鎴戜滑鍏堟槑纭袱涓蹇碉細

娴?42072;鍦?86;(鍦?(97510*45921))*娴?35864

鍦ㄨ繖涓紡瀛愪腑锛?code class="mq-140">娴?42072;鍦?86;琚О涓鸿祴鍊煎紡锛?code class="mq-141">(鍦?(97510*45921))*娴?35864琚О涓鸿〃杈惧紡锛岃祴鍊煎紡鍜岃〃杈惧紡缁熺О涓哄叕寮忥紝+-*/琚О涓鸿繍绠楃銆?/p>

鍒嗘瀽

棣栧厛鎴戜滑瀵规牱鏈殑姣忎釜瀛楀嚭鐜扮殑娆℃暟杩涜浜嗙粺璁★細

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鍙互鐪嬪埌鏁板瓧鐨勫垎甯冨緢鏈夋剰鎬濓紝0鍑虹幇鐨勬鏁版瘮鍏朵粬鏁板瓧閮戒綆锛屽叾浠栫殑鏁板瓧鍑虹幇娆℃暟鍩烘湰涓€鏍凤紝鎵€浠ョ珛鍗虫帹杩欐槸鐩存帴鎸夐殢鏈烘暟鐢熸垚鐨勶紝0涓嶈兘鍑虹幇鍦ㄩ浣嶏紝鎵€浠ユ鐜囧彉浣庛€?/p>

鍒嗗彿鍜岀瓑鍙峰嚭鐜扮殑娆℃暟涓€鏍凤紝杩欐槸鍥犱负姣忎釜璧嬪€煎紡閮芥湁涓€涓瓑鍙峰拰涓€涓垎鍙枫€傚畠鍑虹幇鐨勬鐜囨槸 1.65807锛屽洜姝ゅ彲浠ョ寽鍑轰竴涓祴鍊煎紡鍜屼袱涓祴鍊煎紡鐨勬瘮渚嬫槸 1:2銆?/p>

杩愮畻绗﹀嚭鐜扮殑姒傜巼閮芥槸涓€鏍风殑锛屾墍浠ュ彲浠ユ帹鏂畠浠槸鐩存帴闅忔満鍙栫殑銆?/p>

鎷彿鍑虹幇鐨勬鐜囨槸 1.36505锛屾垜浠粺璁′簡涓€涓嬫嫭鍙峰嚭鐜扮殑鎵€鏈夊彲鑳斤細

1+1+1+1

(1+1)+1+1
1+(1+1)+1
1+1+(1+1)
(1+1+1)+1
1+(1+1+1)

((1+1)+1)+1
(1+(1+1))+1

1+((1+1)+1)
1+(1+(1+1))

(1+1)+(1+1)

涓€鍏辨湁11绉嶅彲鑳斤紝鎸夋嫭鍙风殑鏁伴噺缁熻鎷彿鍑虹幇鐨勯鐜囧彲浠ュ緱鍑?2*5/11.0+5/11.0 = 1.3636锛屽洜姝ゆ嫭鍙蜂篃鏄粠涓婇潰鍑犵妯℃澘闅忔満鍙栫殑銆?/p>

涓枃闄や簡鈥滀笉鈥濆瓧鍑虹幇浜嗕袱娆★紝姒傜巼缈诲€嶏紝鍏朵粬瀛楁鐜囧熀鏈浉绛夈€備腑鏂囧瓧鍙栬嚜浜庝笅闈袱鍙ヨ瘲锛氣€滃悰涓嶈锛岄粍娌充箣姘村ぉ涓婃潵锛屽娴佸埌娴蜂笉澶嶅洖 鐑熼攣姹犲鏌筹紝娣卞湷閾佹澘鐑р€濓紝鎵€浠ヤ篃鍙互鎺ㄦ柇鍑烘槸鎸夊瓧鐩存帴闅忔満鍙栫殑銆?/p>

鎬荤粨

  • 涓枃鐩存帴绛夋鐜囧彇锛屸€滀笉鈥濇鐜囧姞鍊?/p>

  • 鎷彿浠?1绉嶆儏鍐典腑闅忔満鍙?/p>

  • 杩愮畻绗︽瘡娆″繀鍑哄洓涓?/p>

  • 1/3姒傜巼鍙栦竴涓祴鍊煎紡锛?/3姒傜巼鍙?涓祴鍊煎紡

  • 杩愮畻绗?姘歌繙閮戒細鍑虹幇涓€娆★紝涓枃鍦ㄤ笂

  • 杩愮畻绗?-*闅忔満鍙栵紝姒傜巼閮芥槸1/3

  • 鏁板瓧鍙栧€艰寖鍥存槸[0, 100000]

鏁版嵁棰勫鐞?/h2>

鐢变簬鍘熷鐨勫浘鍍忓崄鍒嗗法澶э紝鐩存帴杈撳叆鍒?CNN 涓細鏈?0%浠ヤ笂鐨勫尯鍩熸槸娌℃湁鐢ㄧ殑锛屾墍浠ユ垜浠渶瑕佸鍥惧儚鍋氶澶勭悊锛岃鍓嚭鏈夌敤鐨勯儴鍒嗐€傜劧鍚庡洜涓哄浘鍍忔湁涓ゅ埌涓変釜寮忓瓙锛屽洜姝ゆ垜浠噰鍙栫殑鏂规鏄粠宸﹁嚦鍙虫嫾鎺ュ湪涓€璧凤紝杩欐牱鐨勫ソ澶勬槸鍥惧儚姣旇緝灏忋€傦紙900*80=72000 vs 600*270=162000锛?/p>

鎴戜富瑕佷娇鐢ㄤ簡浠ヤ笅鍑犵鎶€鏈細

  • 杞伆搴﹀浘

  • 鐩存柟鍥惧潎琛?/p>

  • 涓€兼护娉?/p>

  • 寮€闂繍绠?/p>

  • 浜屽€煎寲

  • 杞粨鏌ユ壘

  • 杈圭晫鐭╁舰


棣栧厛鍏堣繘琛屽垵姝ョ殑鍏抽敭鍖哄煙鎻愬彇锛?/p>

def plot(index):
    img = cv2.imread('%s/%d.png'%(IMAGE_DIR, index))
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    eq = cv2.equalizeHist(gray)
    b = cv2.medianBlur(eq, 9)
    
    m, n = img.shape[:2]
    b2 = cv2.resize(b, (n//4, m//4))

    m1 = cv2.morphologyEx(b2, cv2.MORPH_OPEN, np.ones((7, 40)))
    m2 = cv2.morphologyEx(m1, cv2.MORPH_CLOSE, np.ones((4, 4)))
    _, bw = cv2.threshold(m2, 127, 255, cv2.THRESH_BINARY_INV)
    
    bw = cv2.resize(bw, (n, m))

    r = img.copy()
    img2, ctrs, hier = cv2.findContours(bw, cv2.RETR_EXTERNAL, 
      cv2.CHAIN_APPROX_SIMPLE)    for ctr in ctrs:
        x, y, w, h = cv2.boundingRect(ctr)
        cv2.rectangle(r, (x, y), (x+w, y+h), (0, 255, 0), 10)


鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鍘诲櫔

棣栧厛瑕佸皢鍥惧儚杞伆搴﹀浘锛岀劧鍚庣敤鍒濊禌浣跨敤鐨勭洿鏂瑰浘鍧囪 鎻愰珮鍥惧儚鐨勫姣斿害锛岃繖閲屽櫔鐐硅繕鍦紝鎵€浠ラ渶瑕佽繘琛屾护娉紝鎴戜滑杩欓噷浣跨敤浜嗕腑鍊兼护娉紝瀹冭兘寰堝ソ鍦版护鎺夊櫔鐐瑰拰骞叉壈绾裤€傦紙涓婂浘鐨?blur锛?/p>

杩炴帴鍏紡

鐜板湪鎴戜滑鍙叧蹇冨叕寮忕殑鎻愬彇锛岃€屼笉鍦ㄦ剰瀛楃鐨勬彁鍙栵紙鍥犱负鏃犳硶淇濊瘉鍑嗙‘鎻愬彇锛夛紝鎵€浠ユ垜浠渶瑕佸皢杩欎簺瀛楃杩炴帴璧锋潵銆傝繖閲岄鍏堝鍥惧儚杩涜浜?鍊嶇殑缂╂斁锛岀劧鍚庝娇鐢ㄤ簡涓€绉嶅彨鍋氬紑闂繍绠楃殑绠楁硶鏉ヨ繛鎺ュ瓧绗︺€傚洜涓烘垜浠鐨勬槸妯悜杩炴帴锛岀旱鍚戜笉闇€瑕佽繛鎺ワ紝鎵€浠ユ垜浠€夋嫨浜?(7, 40) 澶у皬鐨勫紑杩愮畻锛岀劧鍚庝负浜嗘护鎺変笉蹇呰鐨勫櫔澹帮紝鎴戜滑浣跨敤浜?(4, 4) 鐨勯棴杩愮畻銆傦紙浣嶄簬涓婂浘涓棿鐨?m2锛?/p>

鍏抽敭鍖哄煙鎻愬彇

鍦ㄦ嫾鎺ュソ鍏紡浠ュ悗锛屾垜浠氨鍙互瀵瑰浘鍍忎娇鐢ㄨ疆寤撴煡鎵剧殑绠楁硶浜嗭紝寰堝鏄撴垜浠氨鍙互鎶撳埌鍥惧儚鐨勪笁涓竟缂樼偣闆嗭紝鐒跺悗鎴戜滑浣跨敤杈圭晫鐭╁舰鍑芥暟寰楀埌鐭╁舰鐨?(x, y, w, h)锛屽畬鎴愬叧閿尯鍩熸彁鍙栥€傛彁鍙栦箣鍚庢垜浠皢缁胯壊鐨勭煩褰㈢敾鍦ㄤ簡鍘熷浘涓娿€傦紙浣嶄簬涓婂浘鍙充笅瑙掔殑 rect锛?/p>

寰皟

鐢变簬涔嬪墠浣跨敤浜嗗緢澶х殑 kernel 杩涜婊ゆ尝锛屾墍浠ヨ繖閲岄渶瑕佽繘琛屼竴涓井璋冪殑鎿嶄綔锛?/p>

# 寰皟涓変釜鍏紡d = 20d2 = 5imgs = []
sizes = []for i, ctr in enumerate(ctrs):
    x, y, w, h = cv2.boundingRect(ctr)
    roi = img[max(0, y-d):min(m, y+h+d),max(0, x-d):min(n, x+w+d)]
    p, q, _ = roi.shape
    
    x = b[max(0, y-d):min(m, y+h+d),max(0, x-d):min(n, x+w+d)]
    x = cv2.morphologyEx(x, cv2.MORPH_CLOSE, np.ones((3, 3)))
    _, x = cv2.threshold(x, 127, 255, cv2.THRESH_BINARY_INV)
    _, x, _ = cv2.findContours(x, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    x, y, w, h = cv2.boundingRect(np.vstack(x))
    roi2 = roi[max(0, y-d2):min(p, y+h+d2),max(0, x-d2):min(q, x+w+d2)]
    imgs.append(roi2)
    sizes.append(roi2.shape)

棣栧厛閫氳繃涔嬪墠鐨勭煩褰紝鎵╁厖20鍍忕礌锛岀劧鍚庤鍓嚭鍏抽敭鍖哄煙锛岃繖閲屾槸鐩存帴瀵规护娉㈢殑鍥捐鍓紝鎵€浠ュ垎杈ㄧ巼寰堥珮銆傜劧鍚庣粡杩囩畝鍗曠殑闂繍绠楁护娉紝浜屽€煎寲锛屾彁鍙栬竟妗嗭紝杩欓噷鍗充娇鏈夊櫔鐐逛篃涓嶇敤鎷呭績锛岃澶氫簡涓嶈绱э紝瑁佸皯浜嗘墠楹荤儲锛岀劧鍚庤鍑烘潵鐨勫浘鍙兘浼氭瘮杈冨皬锛屽洜涓烘护娉㈣繃浜嗭紝鎵€浠ュ啀鎵╁厖5涓儚绱狅紝杈惧埌涓嶉敊鐨勬晥鏋溿€?/p>

浠ヤ笅鏄嚑涓緥瀛愶細


鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

杩炴帴涓変釜鍏紡

瑁佸嚭鏉ュ噯纭殑鍏紡浠ュ悗锛屾垜浠氨鍙互鐩存帴杩涜妯悜杩炴帴浜嗭細

# 杩炴帴涓変釜鍏紡sizes = np.array(sizes)
img2 = np.zeros((sizes[:,0].max(), sizes[:,1].sum()+2*(len(sizes)-1), 3),                dtype=np.uint8)
x = 0for a in imgs[::-1]:
    w = a.shape[1]
    img2[:a.shape[0], x:x+w] = a
    x += w + 2



涓嬪浘鏄嫾鎺ュソ鐨勫浘鍍忥細

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

骞惰棰勫鐞?/h3>

濡傛灉鐩存帴浣跨敤 python 鐨?for 寰幆鍘昏窇锛屽彧鑳藉崰鐢ㄤ竴涓牳鐨?CPU 鍒╃敤鐜囷紝涓轰簡鍏呭垎鍒╃敤 CPU锛屾垜浠娇鐢ㄤ簡澶氳繘琛屽苟琛岄澶勭悊鐨勬柟娉曡姣忎釜 CPU 閮借兘婊¤浇杩愯銆備负浜嗚兘澶熷疄鏃舵煡鐪嬭繘搴︼紝鎴戜娇鐢ㄤ簡 tqdm 杩欎釜杩涘害鏉$殑搴撱€?/p>

p = Pool(12)

n = 100000if __name__ == '__main__':
    rs = []    for r in tqdm(p.imap_unordered(f, range(n)), 
total=n): rs.append(r)


鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鎬荤粨

杩欓噷鎴戜滑鎶婂悇涓噺涔嬮棿鐨勫叧绯婚兘鐢诲嚭鏉ヤ簡锛屽緢鏈夋剰鎬濄€?/p>

pd.plotting.scatter_matrix(df, alpha=0.1, figsize=(14,8),
diagonal='kde');


鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)


鍏朵腑鐨?x, y 琛ㄧず鍏紡鐨勮捣濮嬪潗鏍囷紝w, h 琛ㄧず鍏紡鐨勫鍜岄珮锛宯, m 琛ㄧず鍘熷浘鐨勫鍜岄珮锛宺 琛ㄧず鏈夊嚑涓叕寮忋€傛垜浠彲浠ヤ粠鍥句腑鐪嬪埌锛寈, y 娌℃湁鏄庢樉鐨勮寰嬶紝绋嶅井鏈変竴鐐硅寰嬪氨鏄秺瀹界殑鍥捐兘寰楀埌鐨?x 瓒婂ぇ锛堝簾璇濓紝瀹?000鐨勫浘涓嶅彲鑳芥湁鍏紡鍑虹幇鍦?200锛夈€?/p>

w 涔熸病鏈夋槑鏄剧殑瑙勫緥锛屾槸鍏稿瀷鐨勬鎬佸垎甯冿紝鑰?h 鍒欐湁涓や釜宄帮紝杩欐槸鍥犱负鍏紡鏈変袱涓拰涓変釜鐨勫樊鍒€?/p>

m, n 寰堟湁瑙勫緥锛屽畠浠槸鎸夋煇鍑犱釜鍥哄畾鐨勬暟闅忔満鍙栫殑锛宮 鐨勫彇鍊兼槸浠?[400, 500, 600, 700, 800, 900, 1000] 涓殢鏈洪€夊彇鐨勶紝n 鏄粠 [800, 1600, 2400, 3200, 4000] 涓殢鏈哄彇鐨勩€?/p>

Counter(df['m'])
Counter({400: 14233,         500: 14414,         600: 14332,         700: 14304,         800: 14293,         900: 14299,         1000: 14125})

Counter(df['n'])
Counter({800: 19872, 1600: 19937, 2400: 20128, 3200: 19975, 4000: 20088})

妯″瀷缁撴瀯

鍦ㄧ粡杩囧娆$殑浠g爜杩唬浠ュ悗锛屾垜灏?cnn 鎵撳寘涓轰簡涓€涓?model锛岃繖鏍锋ā鍨嬩細绠€娲佸緢澶氾細

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)


妯″瀷鎬濊矾鏄繖鏍风殑锛氶鍏堣緭鍏ヤ竴寮犲浘锛岀劧鍚庨€氳繃 cnn 瀵煎嚭 (112, 10, 128) 鐨勭壒寰佸浘锛屽叾涓?12灏辨槸杈撳叆鍒?rnn 鐨勫簭鍒楅暱搴︼紝10 鎸囩殑鏄瘡涓€鏉$壒寰佺殑楂樺害鏄?0鍍忕礌锛屽皢鍚庨潰 (10, 128) 鐨勭壒寰佸悎骞舵垚1280锛岀劧鍚庣粡杩囦竴涓叏杩炴帴闄嶇淮鍒?28缁达紝灏卞緱鍒颁簡 (112, 128) 鐨勭壒寰侊紝杈撳叆鍒?RNN 涓紝鐒跺悗缁忚繃涓ゅ眰鍙屽悜 GRU 杈撳嚭112涓瓧鐨勬鐜囷紝鐒跺悗鐢?CTC loss 鍘讳紭鍖栨ā鍨嬶紝寰楀埌鑳藉鍑嗙‘璇嗗埆瀛楃搴忓垪鐨勬ā鍨嬨€?/p>

CNN

CNN 鐨勭粨鏋勫涓嬪浘锛?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)


鐞嗚鏈€澶у簭鍒楅暱搴︿负46涓瓧绗︼紙鏁板瓧鍙兘涓?00000锛屾墍浠ユ槸 2*9+3*6+4+4+2=46锛屽浜?CTC 鏉ヨ锛屾垜浠渶濂借杈撳叆澶т簬鏈€澶ч暱搴?鍊嶇殑搴忓垪锛屾墠鑳芥敹鏁涘緱姣旇緝濂姐€備箣鍓嶆垜鐩存帴鍗风Н鍒?0宸﹀彸浜嗭紝鐒跺悗瀵逛簬杩炵画瀛楃鏉ヨ锛屾病鏈夌┖鐧借兘灏嗗畠浠垎闅斿紑鏉ワ紝鎵€浠ユ敹鏁涙晥鏋滀細宸緢澶氥€傝繖閲岀殑鏈€澶у簭鍒楅暱搴︽垜涔嬪墠鎬绘槸绠楅敊锛屽洜涓烘垜鐢ㄧ殑鏄?Python2锛屾病鏈?decode 鎴?utf-8 鐨勮瘽锛屼竴涓腑鏂囧崰涓変釜瀛楄妭銆?/p>

CNN 鐨勭粨鏋勭敱鍘熸潵鐨勪袱灞傚嵎绉竴灞傛睜鍖栵紝鏀逛负浜嗗灞傚嵎绉紝涓€灞傛睜鍖栫殑缁撴瀯锛岀敱浜庡嵎绉眰鍒嗗埆鏄?锛?鍜?灞傦紝鎴戠О涔嬩负 346 缁撴瀯銆?/p>

GRU

涓轰粈涔堜娇鐢?RNN 鍛紝杩欓噷鎴戜妇涓€涓緢缁忓吀鐨勪緥瀛愶細鐮旇〃绌舵槑锛屾眽瀛楃殑搴忛『骞朵笉瀹氫竴鑳藉奖闃呭搷璇伙紝姣斿褰撲綘鐪嬪畬杩欏彞璇濆悗锛屾墠鍙戣繖鐜伴噷鐨勫瓧鍏ㄦ槸閮戒贡鐨勩€?/p>

浜虹溂鍘婚槄璇讳竴娈佃瘽鐨勬椂鍊欙紝鏄細椤惧強鍒颁笂涓嬫枃鐨勶紝涓嶆槸渚濇鍗曚釜瀛楃鐨勮瘑鍒紝鍥犳寮曞叆 RNN 鍘昏瘑鍒笂涓嬫枃鑳藉鏋佸ぇ鎻愬崌妯″瀷鐨勫噯纭巼銆傚湪鍐宠禌涓紝搴忓垪鏈夊嚑涓湴鏂归兘鏄湁涓婁笅鏂囧叧绯荤殑锛?/p>

  • 鍓嶉潰涓€涓垨涓や釜璧嬪€煎紡涓€瀹氭槸 涓枃=鏁板瓧; 杩欐牱鐨勫舰寮?/p>

  • 宸︽嫭鍙蜂竴瀹氫細鏈夊彸鎷彿

  • 鎷彿鐨勪綅缃槸鏈夎娉曡鍒欑殑

  • 涓€瀹氫細鏈変竴涓垎寮?/p>

  • 鍒嗗紡鐨勫垎瀛愪竴瀹氭槸涓枃

  • 濡傛灉鍙湁涓€涓祴鍊煎紡锛岄偅涔堣〃杈惧紡涓殑涓枃涓€瀹氭槸璧嬪€煎紡鐨勪腑鏂?/p>

  • 濡傛灉鏈変袱涓祴鍊煎紡锛岃祴鍊煎紡瀹规槗鐪嬫竻锛岃〃杈惧紡涓嶅鏄撶湅娓咃紝閭d箞鍙互閫氳繃璧嬪€煎紡鐨勪腑鏂囧幓淇琛ㄨ揪寮忕殑涓枃锛岀壒鍒槸鍒嗗瓙涓枃琚鎺夌殑鏃跺€?/p>

鍏朵粬鍙傛暟

鐩告瘮涔嬪墠鍒濊禌鐨勬ā鍨嬶紝杩欓噷杩涜浜嗕竴浜涗慨鏀癸細

  • padding 鍙樹负浜?same锛屼笉鐒舵垜瑙夊緱鐗瑰緛鍥剧殑楂樺害涓嶅锛屾棤娉曡瘑鍒垎鏁?/p>

  • 澧炲姞浜?l2 姝e垯鍖栵紝loss loss 鍙樺緱鏇村ぇ浜嗭紝浣嗘槸鍑嗙‘鐜囧彉寰楁洿楂樹簡锛堟坊鍔?l2 鐨勯儴鍒嗗寘鎷嵎绉眰鐨?kernel锛孊N 灞傜殑 gamma 鍜?beta锛屼互鍙婂叏杩炴帴灞傜殑 weights 鍜?bias锛?/p>

  • 鍚勪釜灞傜殑鍒濆鍖栧彉涓轰簡 he_uniform锛屾晥鏋滄瘮涔嬪墠濂?/p>

  • 鍘绘帀浜?dropout锛屼笉娓呮褰卞搷濡備綍锛屼絾鏄弽姝f湁鐢熸垚鍣紝搴旇涓嶄細鍑虹幇杩囨嫙鍚堢殑鎯呭喌

  • 淇敼杩?GRU 鐨?implementation 涓?锛屽師鍥犳槸甯屾湜鏄惧崱鑳藉姞閫?GRU 鐨勯€熷害锛屼絾鏄技涔庨€熷害杩樹笉濡傝缃负0锛屼娇鐢?CPU 鏉ヨ窇锛屾墍浠ュ張鏀瑰洖鏉ヤ簡

l2 姝e垯鍖栫殑鍙傛暟鐩存帴鍙傝€冧簡 Xception 璁烘枃鐨?4.3 鑺傜粰鐨勫弬鏁帮細

Weight decay: The Inception V3 model uses a weight decay (L2 regularization) rate of 4e-5, which has been carefully tuned for performance on ImageNet. We found this rate to be quite suboptimal for Xception and instead settled for 1e-5.

鐢熸垚鍣?/h2>

涓轰簡寰楀埌鏇村鐨勬暟鎹紝鎻愰珮妯″瀷鐨勬硾鍖栬兘鍔涳紝鎴戜娇鐢ㄤ簡涓€绉嶅緢绠€鍗曠殑鏁版嵁鎵╁厖鍔炴硶锛岄偅灏辨槸鏍规嵁琛ㄨ揪寮忎腑鐨勪腑鏂囬殢鏈烘寫閫夎祴鍊煎紡锛岀粍鎴愭柊鐨勬牱鏈€傝繖閲屾垜浠彇浜嗗墠 350*256=89600 涓牱鏈潵鐢熸垚锛岀敤涔嬪悗鐨?10240 涓牱鏈潵鍋氶獙璇侀泦锛岃繕鏈変竴鐐归浂澶村洜涓哄お灏戝氨娌℃湁鐢ㄤ簡銆?/p>

瀵煎叆鏁版嵁鐨勬椂鍊欙紝鍏堣鍙栬繍绠楀紡鐨勫浘鍍忥紝鐒跺悗鎸変腑鏂囧鍏ヨ祴鍊煎紡鐨勫浘鍍忓埌瀛楀吀涓€傚洜涓哄瓧鍏镐腑鐨?key 鏄棤搴忕殑锛屾墍浠ユ垜浠湪瀛楀吀涓瓨鐨勬槸 list锛屽垪琛ㄦ槸鏈夊簭鐨勩€?/p>

from collections import defaultdict

cn_imgs = defaultdict(list)
cn_labels = defaultdict(list)
ss_imgs = []
ss_labels = []for i in tqdm(range(n1)):
    ss = df[0][i].decode('utf-8').split(';')
    m = len(ss)-1
    ss_labels.append(ss[-1])
    ss_imgs.append(cv2.imread('crop_split2/%d_%d.png'%(i, 0)).transpose(1, 0, 2))    for j in range(m):
        cn_labels[ss[j][0]].append(ss[j])
        cn_imgs[ss[j][0]].append(cv2.imread('crop_split2/%d_%d.png'%(i, m-j)).transpose(1, 0, 2))

鐒跺悗瀹炵幇鐢熸垚鍣紝杩欓噷缁ф壙浜?keras 閲岀殑 Sequence 绫伙細

from keras.utils import Sequenceclass SGen(Sequence):    def __init__(self, batch_size):        self.batch_size = batch_size        self.X_gen = np.zeros((batch_size, width, height, 3), dtype=np.uint8)        self.y_gen = np.zeros((batch_size, n_len), dtype=np.uint8)        self.input_length = np.ones(batch_size)*rnn_length        self.label_length = np.ones(batch_size)*38
    
    def __len__(self):        return 350*256 // self.batch_size    
    def __getitem__(self, idx):        self.X_gen[:] = 0
        for i in range(self.batch_size):            try:
                random_index = random.randint(0, n1-1)                cls = []
                ss = ss_labels[random_index]
                cs = re.findall(ur'[u4e00-u9fff]', df[0][random_index].decode('utf-8').split(';')[-1])
                random.shuffle(cs)
                x = 0
                for c in cs:
                    random_index2 = random.randint(0, len(cn_labels[c])-1)                    cls.append(cn_labels[c][random_index2])
                    img = cn_imgs[c][random_index2]
                    w, h, _ = img.shape                    self.X_gen[i, x:x+w, :h] = img
                    x += w+2
                img = ss_imgs[random_index]
                w, h, _ = img.shape                self.X_gen[i, x:x+w, :h] = img                cls.append(ss)

                random_str = u';'.join(cls)                self.y_gen[i,:len(random_str)] = [characters.find(x) for x in random_str]                self.y_gen[i,len(random_str):] = n_class-1
                self.label_length[i] = len(random_str)            except:                pass
        
        return [self.X_gen, self.y_gen, self.input_length, self.label_length], np.ones(self.batch_size)

棣栧厛闅忔満鍙栦竴涓〃杈惧紡锛岀劧鍚庣敤姝e垯琛ㄨ揪寮忔壘閲岄潰鐨勪腑鏂囷紝鍐嶄粠{涓枃锛氬浘鍍忔暟缁剗鐨勫瓧鍏镐腑闅忔満鍙栧浘鍍忥紝缁忚繃涔嬪墠棰勫鐞嗙殑鏂瑰紡鎷兼帴鎴愪竴涓柊鐨勫簭鍒椼€?/p>

姣斿闅忔満鍙栦簡涓€涓?nbsp;85882*(娌?76020-37023)-閾?/code>锛岀劧鍚庢垜浠粠閾佺殑璧嬪€煎紡涓殢鏈哄彇涓€涓紝鍐嶄粠娌崇殑璧嬪€煎紡涓殢渚垮彇涓€涓紝鎷艰捣鏉ュ氨鑳藉緱鍒颁笅鍥撅細

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鍙互鐪嬪埌鑳屾櫙棰滆壊鏄笉鍚岀殑锛屼絾鏄苟涓嶅奖鍝嶆ā鍨嬪幓璇嗗埆銆?/p>

璁粌

鎴戜滑璁粌鐨勭瓥鐣ユ槸鍏堢敤 Adam() 榛樿鐨勫涔犵巼 1e-3 蹇€熸敹鏁?0浠o紝鐒跺悗鐢?Adam(1e-4) 璺?0浠o紝杈惧埌涓€涓笉閿欑殑 loss锛屾渶鍚庣敤 Adam(1e-5)寰皟50浠o紝姣忎竴浠i兘淇濆瓨鏉冨€硷紝骞朵笖鎶婇獙璇侀泦鐨勫噯纭巼璺戝嚭鏉ャ€傚浘涓殑缁胯壊鐨勭嚎 0.9977 灏辨槸鎸変笂闈㈢殑鏂规硶璁粌鐨勬ā鍨嬶紝

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

褰撶劧鎴戜滑杩樺皾璇曡繃鍏堟寜 1e-3 鐨勫涔犵巼璁粌20浠o紝鐒跺悗 1e-4 鍜?1e-5 浜ゆ浛璁粌2娆★紝姣忔璁粌鍙栭獙璇侀泦 loss 鏈€浣庣殑缁撴灉缁х画璁粌锛屼篃灏辨槸鍥句腑绾㈣壊鐨勭嚎锛岃櫧鐒堕€熷害蹇紝浣嗘槸鍑嗙‘鐜囦笉澶熷ソ銆?/p>

涔嬪悗鎴戜滑灏嗗叏閮ㄨ缁冮泦閮界敤浜庤缁冿紝寰楀埌浜嗚摑鑹茬殑绾匡紝鏁堟灉鍜岀豢鑹插樊涓嶅銆?/p>

棰勬祴缁撴灉

璇诲彇娴嬭瘯闆嗙殑鏍锋湰锛岀劧鍚庣敤 base_model 杩涜棰勬祴锛岃繖涓繃绋嬪緢绠€鍗曪紝灏变笉璁蹭簡銆?/p>

X = np.zeros((n, width, height, channels), dtype=np.uint8)for i in tqdm(range(n)):
    img = cv2.imread('crop_split2_test/%d.png'%i).transpose(1, 0, 2)
    a, b, _ = img.shape
    X[i, :a, :b] = img

base_model = load_model('model_346_split2_3_%s.h5' % z)
base_model2 = make_parallel(base_model, 4)

y_pred = base_model2.predict(X, batch_size=500, verbose=1)
out = K.get_value(K.ctc_decode(y_pred[:,2:], input_length=np.ones(y_pred.shape[0])*rnn_length)[0][0])[:, :n_len]

杈撳嚭鍒版枃浠剁殑閮ㄥ垎鏈変竴鐐瑰€煎緱涓€鎻愶紝灏辨槸濡備綍璁$畻鍑虹湡瀹炲€硷細

ss = map(decode, out)

vals = []
errs = []
errsid = []for i in tqdm(range(100000)):
    val = ''
    try:
        a = ss[i].split(';')
        s = a[-1]        for x in a[:-1]:
            x, c = x.split('=')
            s = s.replace(x, c+'.0')
        val = '%.2f' % eval(s)    except:#         disp3(i)
        errs.append(ss[i])
        errsid.append(i)
        ss[i] = ''
    
    vals.append(val)    
with open('result_%s.txt' % z, 'w') as f:
    f.write('
'.join(map(' '.join, list(zip(ss, vals)))).encode('utf-8'))    
print len(errs)print 1-len(errs)/100000.# output220.99978

鍏朵腑鐨勬€濊矾璇磋捣鏉ヤ篃寰堢畝鍗曪紝灏辨槸灏嗚〃杈惧紡涓殑璧嬪€煎紡涓枃鏇挎崲涓鸿祴鍊煎紡鐨勬暟瀛楋紝鐒跺悗鐩存帴鐢?python eval 寰楀埌缁撴灉锛岀畻涓嶅嚭鏉ョ殑鐩存帴鐣欑┖鍗冲彲銆傝繖涓?.9977妯″瀷鐨勫彲绠楃巼杈惧埌浜?.99978锛屼篃灏辨槸璇村崄涓囦釜鏍锋湰閲岄潰鍙湁22涓牱鏈笉鍙畻锛屽綋鐒讹紝瀹為檯涓婅繕鏄湁涓€浜涙牱鏈嵆浣垮彲绠楋紝涔熶細鍥犱负鍚勭鍘熷洜璇嗗埆閿欙紝姣斿5鍜?灏辨槸閿欒鐨勯噸鐏惧尯锛屾煇浜涙暟瀛楄骞叉壈绾垮垏杩囷紝瀵艰嚧鑲夌溂閮借鲸璁や笉娓呯瓑銆?/p>

妯″瀷缁撴灉铻嶅悎

妯″瀷缁撴灉铻嶅悎鐨勮鍒欏緢绠€鍗曪紝瀵规墍鏈夌殑缁撴灉杩涜娆℃暟缁熻锛屽厛鍘绘帀绌虹殑缁撴灉锛岀劧鍚庡彇鏈€楂樻鏁扮殑缁撴灉鍗冲彲锛屽叾瀹炲氨鏄畝鍗曠殑鎶曠エ銆?/p>

import globimport numpy as npfrom collections import Counterdef fun(x):
    c = Counter(x)
    c[' '] = 0
    return c.most_common()[0][0]

ss = [open(fname, 'r').read().split('
') for fname in glob.glob('result_model*.txt')]
s = np.array(ss).Twith open('result.txt', 'w') as f:
    f.write('
'.join(map(fun, s)))

灏嗕笂闈?loss 鍥句腑鐨勪笁涓ā鍨嬬粨鏋滆瀺鍚堜互鍚庯紝鏈€鍚庡緱鍒颁簡0.99868鐨勬祴璇曢泦鍑嗙‘鐜囥€?/p>

鍏朵粬灏濊瘯

涓嶅畾闀垮浘鍍忚瘑鍒?/h3>

鍦ㄦ瘮璧涘垰寮€濮嬬殑鏃跺€欙紝灏濊瘯杩囧皢鍥惧儚鐨勫搴﹁缃负 None锛屼篃灏辨槸涓嶅畾闀跨殑瀹藉害锛屼絾鏄敱浜庢棤娉曡В鍐?reshape 鐨勯棶棰橈紝杩欎釜鏂规琚惁浜嗐€?/p>

鍒嗗埆璇嗗埆

涔嬪墠灏濊瘯杩囧浘鍍忓垏鎴愬嚑鍧楋紝鍒嗗埆璇嗗埆锛岃祴鍊煎紡鍜岃〃杈惧紡鐨勬ā鍨嬪垎寮€锛岃€冭檻鍒扮敱浜庢棤娉曞緱鍒颁笂涓嬫枃鐨勪俊鎭紝鍙兘浼氫涪澶变竴瀹氱殑鍑嗙‘鐜囷紝鍋氬埌涓€鍗婂惁鎺変簡杩欎釜鏂规銆?/p>

鐢熸垚鍣ㄥ皾璇?/h3>

鎴戜滑灏濊瘯杩囧啓涓€涓敓鎴愬櫒锛屼絾鏄敱浜庡拰瀹樻柟缁欑殑鍥惧儚宸お杩滐紝骞朵笖瀹為檯娴嬭瘯鐨勬椂鍊欒涔堟槸鐢熸垚鐨勫噯纭巼楂橈紝瀹樻柟鐨勫噯纭巼浣庯紝瑕佷箞鍙嶈繃鏉ワ紝鎵€浠ユ病鏈夋姇鍏ヤ娇鐢ㄣ€?/p>

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

涓婂浘绗竴涓槸瀹樻柟鐨勫浘鍍忥紝鍚庨潰浜斾釜鏄垜浠殑鐢熸垚鍣ㄧ敓鎴愮殑锛屽彲浠ョ湅鍒版垜浠殑瀛楁病鏈夊畼鏂圭殑绱у噾锛岀瓑鍙蜂篃涓嶅お涓€鏍凤紝鍒嗗紡鎴戜滑鐨勫瓧鍙堝お绱у噾浜嗐€?/p>

鍏朵粬 CNN 妯″瀷鐨勫皾璇?/h3>

闄や簡鑷繁鎼ā鍨嬶紝鎴戣繕灏濊瘯杩囩敤 ResNet锛孌enseNet 鏇挎崲 CNN锛岀劧鍚庡幓璁粌锛屼絾鏄敱浜庢湰韬繖浜涙ā鍨嬪氨寰堝ぇ锛岃缁冭捣鏉ラ€熷害寰堟參锛岀劧鍚庝富瑕侀棶棰樺張涓嶅湪妯″瀷涓嶅澶嶆潅锛屽洜涓轰粠缁樺埗鍑烘潵鐨?loss 鏇茬嚎鏉ョ湅锛岃櫧鐒跺墠闈㈢殑 val_loss 涓€鐩村湪鎶栵紝浣嗘槸鍦ㄧ50浠e涔犵巼涓嬮檷浠ュ悗灏遍潪甯稿钩缂撲簡锛岃繖妯″瀷鏄病鏈夎繃鎷熷悎鐨勶細

鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)

鏇挎崲 GRU 涓?LSTM

鍦ㄦ瘮璧涙渶鍚庡皾璇曡繃灏?GRU 鏇挎崲涓?LSTM锛屽緱鍒扮殑缁撴灉鏄崄鍒嗙被浼肩殑锛屼絾鏄彁浜や笂鍘讳互鍚庡噯纭巼鏈夎交寰笅闄嶏紙澶氶敊浜嗗嚑涓牱鏈紝鍙兘鏄繍姘旈棶棰橈級锛屼箣鍓嶅仛楠岃瘉鐮佽瘑鍒殑鏃跺€欎篃鏄浛鎹㈣繃锛屾晥鏋滃樊涓嶅锛屽洜姝ゆ病鏈夌户缁皾璇曘€傜悊璁轰笂杩欎釜搴忓垪闀垮害骞舵病鏈夊緢闀匡紝GRU 鍜?LSTM 褰卞搷涓嶅ぇ銆?/p>

鎬荤粨

瀵归」鐩殑鎬濊€?/h3>

鏈」鐩腑锛岄渶瑕佹敞鎰忎互涓嬪嚑涓噸瑕佺殑鐐癸細

  • 鏁版嵁鍑嗗锛?/p>

  • 娣卞害瀛︿範鍚屼紶缁熷浘鍍忓鐞嗘妧鏈粨鍚堬紝鍙互杈惧埌鏇村ソ鐨勫噯纭巼

  • 鏂囨湰璇嗗埆鍙互鏋勯€犻獙璇佺爜鐢熸垚鍣ㄨ繘琛屾暟鎹寮猴紝澧炲姞璁粌鏍锋湰鏁?/p>

  • 妯″瀷浼樺寲锛?/p>

  • 濡備綍鏍规嵁椤圭洰鐗圭偣锛屽妯″瀷缁撴瀯杩涜璋冩暣锛屽CNN 閮ㄥ垎鍑忓皯姹犲寲灞備娇鐢紝绛夌瓑

  • 涓轰簡闃叉杩囨嫙鍚堬紝鍦ㄦā鍨嬩腑寮曞叆 L2 姝e垯鍖?/p>

  • 妯″瀷璁粌锛?/p>

  • 浣跨敤瀛︿範鐜囪“鍑忕瓥鐣ワ紝璁粌妯″瀷

  • 瀵瑰鏉傜殑妯″瀷锛屽彲浠ュ皢鍚屼竴鎵规杈撳叆鏁版嵁鍒嗘憡缁欏涓狦PU杩涜璁$畻銆?/p>




涓嶆柇鏇存柊璧勬簮

娣卞害瀛︿範銆佹満鍣ㄥ涔犮€佹暟鎹垎鏋愩€乸ython

以上是关于鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)的主要内容,如果未能解决你的问题,请参考以下文章

娣卞害瀛︿範鍘熺悊璇﹁В鍙奝ython浠g爜瀹炵幇

鍏ラ棬娣卞害瀛︿範锛岀悊瑙g缁忕綉缁溿€佸弽鍚戜紶鎾畻娉曟槸绗竴鍏?/h1>

娣卞害瀛︿範_1_Tensorflow_1

娣卞害瀛︿範涓嶯LP

Github 9.9K Star鐨勩€婄缁忕綉缁滀笌娣卞害瀛︿範銆嬶紙闄勪笅杞斤級