鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)
Posted 鏈哄櫒瀛︿範AI绠楁硶宸ョ▼
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)相关的知识,希望对你有一定的参考价值。
鍚慉I杞瀷鐨勭▼搴忓憳閮藉叧娉ㄤ簡杩欎釜鍙?/span>馃憞馃憞馃憞
鐩稿叧鏂囩珷
澶ц禌瀹樼綉锛歨ttp://meizu.baiducloud.top/ps/web/index.html (4*8)+8 (0-2)+5 2*8-7 瑕佹眰鍙傝禌鑰呯粰鍑烘瘡寮犲浘鐗囦腑鐨勭畻寮忓拰杩愮畻缁撴灉銆?/p>
璁粌闆嗗叡100,000寮犲浘鐗囷紝骞堕檮甯︽爣绛俱€傛祴璇曢泦鍏?00,000寮犲浘鐗囷紝鏃犳爣绛撅紝棰勬祴缁撴灉涓婁紶鍚庤绠楁纭巼锛屼綔涓哄垵璧涚殑鎺掑悕銆?/p>
闂鎻忚堪 鏈绔炶禌鐩殑鏄负浜嗚В鍐充竴涓?OCR 闂锛岄€氫織鍦拌灏辨槸瀹炵幇鍥惧儚鍒版枃瀛楃殑杞崲杩囩▼銆?/p>
鏁版嵁闆?/span> 鍒濊禌鏁版嵁闆嗕竴鍏卞寘鍚?0涓囧紶180*60鐨勫浘鐗囧拰涓€涓猯abels.txt鐨勬枃鏈枃浠躲€傛瘡寮犲浘鐗囧寘鍚竴涓暟瀛﹁繍绠楀紡锛岃繍绠楀紡鍖呭惈锛?/p>
3涓繍绠楁暟锛?涓?鍒?鐨勬暣鍨嬫暟瀛楋紱 2涓繍绠楃锛氬彲浠ユ槸+銆?銆?锛屽垎鍒唬琛ㄥ姞娉曘€佸噺娉曘€佷箻娉?0鎴?瀵规嫭鍙凤細鎷彿鍙兘鏄?瀵规垨鑰?瀵?/p>
鍥剧墖鐨勫悕绉颁粠0.png鍒?9999.png锛屼笅闈㈡槸涓€浜涙牱渚嬪浘鐗囷紙杩欓噷鍙彇浜嗕竴寮狅級锛?/p>
鏂囨湰鏂囦欢 labels.txt 鍖呭惈10w琛屾枃鏈紝姣忚鏂囨湰鍖呭惈姣忓紶鍥剧墖瀵瑰簲鐨勫叕寮忎互鍙婂叕寮忕殑璁$畻缁撴灉锛屽叕寮忓拰璁$畻缁撴灉涔嬮棿绌烘牸鍒嗗紑锛屼緥濡傚浘鐗囦腑鐨勭ず渚嬪浘鐗囧搴旂殑鏂囨湰濡備笅鎵€绀猴細 璇勪环鎸囨爣 瀹樻柟鐨勮瘎浠锋寚鏍囨槸鍑嗙‘鐜囷紝鍒濊禌鍙湁鏁存暟鐨勫姞鍑忎箻杩愮畻锛屾墍寰楃殑缁撴灉涓€瀹氭槸鏁存暟锛屾墍浠ヨ姹傚簭鍒椾笌杩愮畻缁撴灉閮芥纭墠浼氬垽瀹氫负姝g‘銆?/p>
鎴戜滑鏈湴闄や簡浼氫娇鐢ㄥ畼鏂圭殑鍑嗙‘鐜囦綔涓鸿瘎浼版爣鍑嗕互澶栵紝杩樹細浣跨敤 CTC loss 鏉ヨ瘎浼版ā鍨嬨€?/p>
浣跨敤 captcha 杩涜鏁版嵁澧炲己 瀹樻柟鎻愪緵浜?0涓囧紶鍥剧墖锛屾垜浠彲浠ョ洿鎺ヤ娇鐢ㄥ畼鏂规暟鎹繘琛岃缁冿紝涔熷彲浠ラ€氳繃Captcha锛屽弬鐓у畼鏂硅缁冮泦锛岄殢鏈虹敓鎴愭洿澶氭暟鎹紝杩涜€屾彁楂樺噯纭€с€傛牴鎹鐩姹傦紝label 蹇呭畾鏄笁涓暟瀛楋紝涓や釜杩愮畻绗︼紝涓€瀵规垨娌℃湁鎷彿锛屾牴鎹嫭鍙疯鍒欙紝鍙湁鍙兘鏄病鎷彿锛屽乏鎷彿鍜屽彸鎷彿锛屽洜姝ゅ緢瀹规槗灏卞彲浠ュ啓鍑烘暟鎹敓鎴愬櫒鐨勪唬鐮併€?/p>
鐢熸垚鍣?/strong> 鐢熸垚鍣ㄧ殑鐢熸垚瑙勫垯寰堢畝鍗曪細 鐩镐俊澶у閮借兘鐪嬫噦銆傚綋鐒讹紝鎴戝啓鏂囩珷鐨勬椂鍊欏張鎯冲埌涓€绉嶆洿濂界殑鍐欐硶锛?/p>
闄や簡鐢熸垚绠楀紡浠ュ锛岃繕鏈変竴涓€煎緱娉ㄦ剰鐨勫湴鏂瑰氨鏄垵璧涙墍鏈夌殑鍑忓彿锛堜篃灏辨槸鈥?鈥濓級閮芥槸缁嗙殑锛屼絾鏄垜浠洿鎺ョ敤 captcha 搴撶敓鎴愬浘鍍忎細寰楀埌绮楃殑鍑忓彿锛屾墍浠ユ垜浠慨鏀逛簡 image.py 涓殑浠g爜锛屽湪 _draw_character 鍑芥暟涓垜浠鍔犱簡涓€鍙ュ垽鏂紝濡傛灉鏄噺鍙凤紝鎴戜滑灏变笉杩涜 resize 鎿嶄綔锛岃繖鏍峰氨鑳介槻姝㈠噺鍙峰彉绮楋細 鎴戜滑缁ц€屼娇鐢ㄧ敓鎴愬櫒鐢熸垚鍥涘垯杩愮畻楠岃瘉鐮侊細 涓婂浘灏辨槸鍘熺増鐢熸垚鍣ㄧ敓鎴愮殑鍥撅紝鎴戜滑鍙互鐪嬪埌鍑忓彿鏄緢绮楃殑銆?/p>
涓婂浘鏄慨鏀硅繃鐨勭敓鎴愬櫒锛屽彲浠ョ湅鍒板噺鍙峰凡缁忎笉绮椾簡銆?/p>
妯″瀷缁撴瀯 妯″瀷缁撴瀯鍍忎箣鍓嶅啓鐨勬枃绔犱竴鏍凤紝鍙槸鎶婂嵎绉牳鐨勪釜鏁版敼澶氫簡涓€鐐癸紝鍔犱簡涓€浜?BN 灞傦紝骞朵笖鍦ㄥ洓鍗′笂鍋氫簡涓€鐐瑰皬鏀瑰姩浠ユ敮鎸佸GPU璁粌銆傚鏋滀綘鏄崟鍗★紝鍙互鐩存帴鍘绘帀 base_model2 = make_parallel(base_model, 4) 鐨勪唬鐮併€?/p>
BN 灞備富瑕佹槸涓轰簡璁粌鍔犻€燂紝瀹為獙缁撴灉闈炲父濂斤紝妯″瀷鏀舵暃蹇簡寰堝銆?/p>
base_model 鐨勫彲瑙嗗寲锛?/p>
model 鐨勫彲瑙嗗寲锛?/p>
妯″瀷璁粌 鍦ㄧ粡杩囧嚑娆℃祴璇曚互鍚庯紝鎴戝凡缁忔姏寮冧簡 evaluate 鍑芥暟锛屽洜涓哄湪楠岃瘉闆嗕笂宸茬粡鑳藉仛鍒?100% 璇嗗埆鐜囦簡锛屾墍浠ュ彧闇€瑕佺湅 val_loss 灏卞彲浠ヤ簡銆傚湪缁忚繃涔嬪墠鐨勫嚑娆″皾璇曚互鍚庯紝鎴戝彂鐜板湪鏈夌敓鎴愬櫒鐨勬儏鍐典笅锛岃缁冧唬鏁拌秺澶氳秺濂斤紝鍥犳鐩存帴鐢?adam 璺戜簡50浠o紝姣忎唬10涓囨牱鏈紝鍙互鐪嬪埌妯″瀷鍦?0浠d互鍚庡熀鏈凡缁忔敹鏁涖€?/p>
鎴戜滑鍙互鐪嬪埌妯″瀷鍏堝垎涓哄洓浠斤紝鍦ㄥ洓涓樉鍗′笂骞惰璁$畻锛岀劧鍚庡悎骞剁粨鏋滐紝璁$畻鏈€鍚庣殑 ctc loss锛岃繘鑰岃缁冩ā鍨嬨€?/p>
缁撴灉鍙鍖?/p>
杩欓噷鎴戜滑瀵圭敓鎴愮殑鏁版嵁杩涜浜嗗彲瑙嗗寲锛屽彲浠ョ湅鍒版ā鍨嬪熀鏈凡缁忓仛鍒颁竾鏃犱竴澶憋紝鐧惧彂鐧句腑銆?/p>
鎵撳寘鎴?docker 浠ュ悗鎻愪氦鍒版瘮璧涚郴缁熶腑锛岀粡杩囧崄鍑犲垎閽熺殑杩愯锛屾垜浠緱鍒颁簡瀹岀編鐨?鍒嗐€?/p>
鎬荤粨 鍒濊禌鏄潪甯哥畝鍗曠殑锛屽洜姝ゆ垜浠墠鑳藉緱鍒拌繖涔堝噯鐨勫垎鏁帮紝涔嬪悗瀹樻柟杩涗竴姝ユ彁鍗囦簡闅惧害锛屽皢鍒濊禌娴嬭瘯闆嗘彁楂樺埌浜?0涓囧紶锛屽湪杩欎釜闆嗕笂鎴戜滑鐨勬ā鍨嬪彧鑳芥嬁鍒?.999925鐨勬垚缁╋紝鍙鐨勬敼杩涙柟娉曟槸灏嗗噯纭巼杩涗竴姝ラ檷浣庯紝鍏呭垎璁粌妯″瀷锛屽皢澶氫釜妯″瀷缁撴灉铻嶅悎绛夈€?/p>
瀹樻柟鎵╁厖娴嬭瘯闆嗙殑闅剧偣 鍦ㄦ墿鍏呮暟鎹泦涓婏紝鎴戜滑鍙戠幇鏈変竴浜涘浘鐗囬娴嬪嚭鏉ユ棤娉曡绠楋紝姣斿 [629,2271,6579,17416,71857,77631,95303,102187,117422,142660,183693] 绛夛紝杩欓噷鎴戜滑鍙?117422.png 涓轰緥銆?/p>
鎴戜滑鍙互鐪嬪埌鑲夌溂鍩烘湰鏃犳硶璁ゅ嚭杩欎釜鍥撅紝浣嗘槸缁忚繃涓€瀹氱殑鍥惧儚澶勭悊锛屾垜浠彲浠ユ樉鐜板嚭鏉ュ畠鐨勭湡瀹為潰璨岋細 IMAGE_DIR='image_contest_level_1_validate'index =117422img =cv2.imread( '%s/%d.png'%( IMAGE_DIR, index))gray =cv2.cvtColor(img, cv2. COLOR_BGR2GRAY)h =cv2.equalizeHist(gray) 鐒跺悗鎴戜滑鍙互鐪嬪埌杩欐牱鐨勭粨鏋滐細 褰撶劧锛岃繕鏈変竴寮犲浘鏄棤娉曢€氳繃棰勫鐞嗗緱鍒扮粨鏋滅殑锛?42660锛岃繖鏈夊彲鑳芥槸绋嬪簭鐨?bug 閫犳垚鐨勫皬姒傜巼浜嬩欢锛屾墍浠ュ垵璧涢櫎浜嗘垜浠窇浜嗕竴涓?docker 寰楀埌婊″垎浠ュ锛屾病鏈夌浜屼釜浜鸿揪鍒版弧鍒嗐€?/p>
鍐宠禌鏃剁殑鎵€鏈夋€濊矾 鍐宠禌鏁版嵁闆嗕竴鍏卞寘鍚?0涓囧紶鍥剧墖鍜屼竴涓猯abels.txt鐨勬枃鏈枃浠躲€傛瘡寮犲浘鐗囧寘鍚竴涓暟瀛﹁繍绠楀紡锛岃繍绠楀紡涓寘鍚細 鍥剧墖澶у皬涓嶅浐瀹?/p> 鍥剧墖涓殑鏌愪竴鍧楀尯鍩熶负鍏紡閮ㄥ垎 鍥剧墖涓寘鍚簩琛屾垨鑰呬笁琛岀殑鍏紡 鍏紡绫诲瀷鏈変袱绉嶏細璧嬪€煎拰鍥涘垯杩愮畻鐨勫叕寮忋€備袱琛岀殑鍖呮嫭鐢变竴涓祴鍊煎叕寮忓拰涓€涓绠楀叕寮忥紝涓夎鐨勫寘鎷袱涓祴鍊煎叕寮忓拰涓€涓绠楀叕寮忋€傚姞鍙凤紙+锛?鍗充娇鏃嬭浆涓?x 锛屼粛涓哄姞鍙凤紝 * 鏄箻鍙?/p> 璧嬪€肩被鐨勫叕寮忥紝鍙橀噺鍚嶄负涓€涓眽瀛椼€?姹夊瓧鏉ヨ嚜涓ゅ彞璇楋紙涓嶅寘鎷€楀彿锛夛細 鍚涗笉瑙侊紝榛勬渤涔嬫按澶╀笂鏉ワ紝濂旀祦鍒版捣涓嶅鍥?鐑熼攣姹犲鏌筹紝娣卞湷閾佹澘鐑?/p> 鍥涘垯杩愮畻鐨勫叕寮忓寘鎷姞娉曘€佸噺娉曘€佷箻娉曘€佸垎鏁般€佹嫭鍙枫€?鍏朵腑鐨勬暟瀛椾负澶氫綅鏁板瓧锛屾眽瀛椾负鍙橀噺锛岀敱涓婇潰鐨勮鍙ヨ祴鍊笺€?/p> 杈撳嚭缁撴灉鐨勬牸寮忎负锛氬浘鐗囦腑鐨勫叕寮忥紝涓€涓嫳鏂囩┖鏍硷紝璁$畻缁撴灉銆?鍏朵腑锛?涓嶅悓琛屽叕寮忎箣闂翠娇鐢ㄨ嫳鏂囧垎鍙峰垎闅?璁$畻缁撴灉鏃讹紝鍒嗘暟鎸夌収娴偣鏁拌绠楋紝璁$畻缁撴灉璇樊涓嶈秴杩?.01锛岃涓烘纭€?/p> 鏁翠釜label鏂囦欢浣跨敤UTF8缂栫爜 鍐宠禌鏍蜂緥锛?/p>
鍒濊禌鐨勯涓嶉毦锛屽彧闇€瑕佽瘑鍒枃鏈簭鍒楀嵆鍙紝鍐宠禌鐨勭畻寮忔瘮杈冨鏉傦紝闇€瑕佸厛缁忚繃鍥惧儚澶勭悊锛岀劧鍚庢墠鑳借緭鍏ュ埌绁炵粡缃戠粶涓繘琛岀鍒扮鐨勬枃鏈簭鍒楄瘑鍒€?/p>
瀹樻柟鐨勮瘎浠锋寚鏍囨槸鍑嗙‘鐜囷紝鍒濊禌鍙湁鏁存暟鐨勫姞鍑忎箻杩愮畻锛屾墍寰楃殑缁撴灉涓€瀹氭槸鏁存暟锛屾墍浠ヨ姹傚簭鍒椾笌杩愮畻缁撴灉閮芥纭墠浼氬垽瀹氫负姝g‘銆?/p>
浣嗗喅璧涚殑鏁板瓧閫氬父閮芥槸浜斾綅鏁帮紝骞朵笖浼氭湁寰堝涔樻硶鍜屽姞娉曪紝浠ュ強涓€瀹氫細瀛樺湪鐨勪竴涓垎鏁帮紝鎵€浠ョ粨鏋滃緢瀹规槗瓒呭嚭64浣嶆诞鐐规暟鎵€鑳借〃绀虹殑鑼冨洿锛屽洜姝ゅ畼鏂瑰湪缁忚繃璁ㄨ鍚庡喅瀹氬彧鑰冭檻鏂囨湰搴忓垪鐨勮瘑鍒紝涓嶈瘎浠疯繍绠楃粨鏋溿€?/p>
鑰屾垜浠湰鍦伴櫎浜嗕細浣跨敤瀹樻柟鐨勫噯纭巼浣滀负璇勪及鏍囧噯浠ュ锛岃繕浼氫娇鐢?CTC loss 鏉ヨ瘎浼版ā鍨嬨€?/p>
鍐宠禌鐨勬暟鎹泦鎺㈢储灏卞鏉傚緱澶氾紝鎴戜滑鍏堟槑纭袱涓蹇碉細 鍦ㄨ繖涓紡瀛愪腑锛?code class="mq-140">娴?42072;鍦?86;琚О涓鸿祴鍊煎紡锛?code class="mq-141">(鍦?(97510*45921))*娴?35864琚О涓鸿〃杈惧紡锛岃祴鍊煎紡鍜岃〃杈惧紡缁熺О涓哄叕寮忥紝 棣栧厛鎴戜滑瀵规牱鏈殑姣忎釜瀛楀嚭鐜扮殑娆℃暟杩涜浜嗙粺璁★細 鍙互鐪嬪埌鏁板瓧鐨勫垎甯冨緢鏈夋剰鎬濓紝0鍑虹幇鐨勬鏁版瘮鍏朵粬鏁板瓧閮戒綆锛屽叾浠栫殑鏁板瓧鍑虹幇娆℃暟鍩烘湰涓€鏍凤紝鎵€浠ョ珛鍗虫帹杩欐槸鐩存帴鎸夐殢鏈烘暟鐢熸垚鐨勶紝0涓嶈兘鍑虹幇鍦ㄩ浣嶏紝鎵€浠ユ鐜囧彉浣庛€?/p>
鍒嗗彿鍜岀瓑鍙峰嚭鐜扮殑娆℃暟涓€鏍凤紝杩欐槸鍥犱负姣忎釜璧嬪€煎紡閮芥湁涓€涓瓑鍙峰拰涓€涓垎鍙枫€傚畠鍑虹幇鐨勬鐜囨槸 1.65807锛屽洜姝ゅ彲浠ョ寽鍑轰竴涓祴鍊煎紡鍜屼袱涓祴鍊煎紡鐨勬瘮渚嬫槸 1:2銆?/p>
杩愮畻绗﹀嚭鐜扮殑姒傜巼閮芥槸涓€鏍风殑锛屾墍浠ュ彲浠ユ帹鏂畠浠槸鐩存帴闅忔満鍙栫殑銆?/p>
鎷彿鍑虹幇鐨勬鐜囨槸 1.36505锛屾垜浠粺璁′簡涓€涓嬫嫭鍙峰嚭鐜扮殑鎵€鏈夊彲鑳斤細 涓€鍏辨湁11绉嶅彲鑳斤紝鎸夋嫭鍙风殑鏁伴噺缁熻鎷彿鍑虹幇鐨勯鐜囧彲浠ュ緱鍑?2*5/11.0+5/11.0 = 1.3636锛屽洜姝ゆ嫭鍙蜂篃鏄粠涓婇潰鍑犵妯℃澘闅忔満鍙栫殑銆?/p>
涓枃闄や簡鈥滀笉鈥濆瓧鍑虹幇浜嗕袱娆★紝姒傜巼缈诲€嶏紝鍏朵粬瀛楁鐜囧熀鏈浉绛夈€備腑鏂囧瓧鍙栬嚜浜庝笅闈袱鍙ヨ瘲锛氣€滃悰涓嶈锛岄粍娌充箣姘村ぉ涓婃潵锛屽娴佸埌娴蜂笉澶嶅洖 鐑熼攣姹犲鏌筹紝娣卞湷閾佹澘鐑р€濓紝鎵€浠ヤ篃鍙互鎺ㄦ柇鍑烘槸鎸夊瓧鐩存帴闅忔満鍙栫殑銆?/p>
涓枃鐩存帴绛夋鐜囧彇锛屸€滀笉鈥濇鐜囧姞鍊?/p> 鎷彿浠?1绉嶆儏鍐典腑闅忔満鍙?/p> 杩愮畻绗︽瘡娆″繀鍑哄洓涓?/p> 1/3姒傜巼鍙栦竴涓祴鍊煎紡锛?/3姒傜巼鍙?涓祴鍊煎紡 杩愮畻绗?姘歌繙閮戒細鍑虹幇涓€娆★紝涓枃鍦ㄤ笂 杩愮畻绗?-*闅忔満鍙栵紝姒傜巼閮芥槸1/3 鏁板瓧鍙栧€艰寖鍥存槸[0, 100000] 鐢变簬鍘熷鐨勫浘鍍忓崄鍒嗗法澶э紝鐩存帴杈撳叆鍒?CNN 涓細鏈?0%浠ヤ笂鐨勫尯鍩熸槸娌℃湁鐢ㄧ殑锛屾墍浠ユ垜浠渶瑕佸鍥惧儚鍋氶澶勭悊锛岃鍓嚭鏈夌敤鐨勯儴鍒嗐€傜劧鍚庡洜涓哄浘鍍忔湁涓ゅ埌涓変釜寮忓瓙锛屽洜姝ゆ垜浠噰鍙栫殑鏂规鏄粠宸﹁嚦鍙虫嫾鎺ュ湪涓€璧凤紝杩欐牱鐨勫ソ澶勬槸鍥惧儚姣旇緝灏忋€傦紙900*80=72000 vs 600*270=162000锛?/p>
鎴戜富瑕佷娇鐢ㄤ簡浠ヤ笅鍑犵鎶€鏈細 杞伆搴﹀浘 鐩存柟鍥惧潎琛?/p> 涓€兼护娉?/p> 寮€闂繍绠?/p> 浜屽€煎寲 杞粨鏌ユ壘 杈圭晫鐭╁舰 棣栧厛鍏堣繘琛屽垵姝ョ殑鍏抽敭鍖哄煙鎻愬彇锛?/p>
棣栧厛瑕佸皢鍥惧儚杞伆搴﹀浘锛岀劧鍚庣敤鍒濊禌浣跨敤鐨勭洿鏂瑰浘鍧囪 鎻愰珮鍥惧儚鐨勫姣斿害锛岃繖閲屽櫔鐐硅繕鍦紝鎵€浠ラ渶瑕佽繘琛屾护娉紝鎴戜滑杩欓噷浣跨敤浜嗕腑鍊兼护娉紝瀹冭兘寰堝ソ鍦版护鎺夊櫔鐐瑰拰骞叉壈绾裤€傦紙涓婂浘鐨?blur锛?/p>
鐜板湪鎴戜滑鍙叧蹇冨叕寮忕殑鎻愬彇锛岃€屼笉鍦ㄦ剰瀛楃鐨勬彁鍙栵紙鍥犱负鏃犳硶淇濊瘉鍑嗙‘鎻愬彇锛夛紝鎵€浠ユ垜浠渶瑕佸皢杩欎簺瀛楃杩炴帴璧锋潵銆傝繖閲岄鍏堝鍥惧儚杩涜浜?鍊嶇殑缂╂斁锛岀劧鍚庝娇鐢ㄤ簡涓€绉嶅彨鍋氬紑闂繍绠楃殑绠楁硶鏉ヨ繛鎺ュ瓧绗︺€傚洜涓烘垜浠鐨勬槸妯悜杩炴帴锛岀旱鍚戜笉闇€瑕佽繛鎺ワ紝鎵€浠ユ垜浠€夋嫨浜?(7, 40) 澶у皬鐨勫紑杩愮畻锛岀劧鍚庝负浜嗘护鎺変笉蹇呰鐨勫櫔澹帮紝鎴戜滑浣跨敤浜?(4, 4) 鐨勯棴杩愮畻銆傦紙浣嶄簬涓婂浘涓棿鐨?m2锛?/p>
鍦ㄦ嫾鎺ュソ鍏紡浠ュ悗锛屾垜浠氨鍙互瀵瑰浘鍍忎娇鐢ㄨ疆寤撴煡鎵剧殑绠楁硶浜嗭紝寰堝鏄撴垜浠氨鍙互鎶撳埌鍥惧儚鐨勪笁涓竟缂樼偣闆嗭紝鐒跺悗鎴戜滑浣跨敤杈圭晫鐭╁舰鍑芥暟寰楀埌鐭╁舰鐨?(x, y, w, h)锛屽畬鎴愬叧閿尯鍩熸彁鍙栥€傛彁鍙栦箣鍚庢垜浠皢缁胯壊鐨勭煩褰㈢敾鍦ㄤ簡鍘熷浘涓娿€傦紙浣嶄簬涓婂浘鍙充笅瑙掔殑 rect锛?/p>
鐢变簬涔嬪墠浣跨敤浜嗗緢澶х殑 kernel 杩涜婊ゆ尝锛屾墍浠ヨ繖閲岄渶瑕佽繘琛屼竴涓井璋冪殑鎿嶄綔锛?/p>
棣栧厛閫氳繃涔嬪墠鐨勭煩褰紝鎵╁厖20鍍忕礌锛岀劧鍚庤鍓嚭鍏抽敭鍖哄煙锛岃繖閲屾槸鐩存帴瀵规护娉㈢殑鍥捐鍓紝鎵€浠ュ垎杈ㄧ巼寰堥珮銆傜劧鍚庣粡杩囩畝鍗曠殑闂繍绠楁护娉紝浜屽€煎寲锛屾彁鍙栬竟妗嗭紝杩欓噷鍗充娇鏈夊櫔鐐逛篃涓嶇敤鎷呭績锛岃澶氫簡涓嶈绱э紝瑁佸皯浜嗘墠楹荤儲锛岀劧鍚庤鍑烘潵鐨勫浘鍙兘浼氭瘮杈冨皬锛屽洜涓烘护娉㈣繃浜嗭紝鎵€浠ュ啀鎵╁厖5涓儚绱狅紝杈惧埌涓嶉敊鐨勬晥鏋溿€?/p>
浠ヤ笅鏄嚑涓緥瀛愶細 瑁佸嚭鏉ュ噯纭殑鍏紡浠ュ悗锛屾垜浠氨鍙互鐩存帴杩涜妯悜杩炴帴浜嗭細 涓嬪浘鏄嫾鎺ュソ鐨勫浘鍍忥細 濡傛灉鐩存帴浣跨敤 python 鐨?for 寰幆鍘昏窇锛屽彧鑳藉崰鐢ㄤ竴涓牳鐨?CPU 鍒╃敤鐜囷紝涓轰簡鍏呭垎鍒╃敤 CPU锛屾垜浠娇鐢ㄤ簡澶氳繘琛屽苟琛岄澶勭悊鐨勬柟娉曡姣忎釜 CPU 閮借兘婊¤浇杩愯銆備负浜嗚兘澶熷疄鏃舵煡鐪嬭繘搴︼紝鎴戜娇鐢ㄤ簡 tqdm 杩欎釜杩涘害鏉$殑搴撱€?/p>
杩欓噷鎴戜滑鎶婂悇涓噺涔嬮棿鐨勫叧绯婚兘鐢诲嚭鏉ヤ簡锛屽緢鏈夋剰鎬濄€?/p>
鍏朵腑鐨?x, y 琛ㄧず鍏紡鐨勮捣濮嬪潗鏍囷紝w, h 琛ㄧず鍏紡鐨勫鍜岄珮锛宯, m 琛ㄧず鍘熷浘鐨勫鍜岄珮锛宺 琛ㄧず鏈夊嚑涓叕寮忋€傛垜浠彲浠ヤ粠鍥句腑鐪嬪埌锛寈, y 娌℃湁鏄庢樉鐨勮寰嬶紝绋嶅井鏈変竴鐐硅寰嬪氨鏄秺瀹界殑鍥捐兘寰楀埌鐨?x 瓒婂ぇ锛堝簾璇濓紝瀹?000鐨勫浘涓嶅彲鑳芥湁鍏紡鍑虹幇鍦?200锛夈€?/p>
w 涔熸病鏈夋槑鏄剧殑瑙勫緥锛屾槸鍏稿瀷鐨勬鎬佸垎甯冿紝鑰?h 鍒欐湁涓や釜宄帮紝杩欐槸鍥犱负鍏紡鏈変袱涓拰涓変釜鐨勫樊鍒€?/p>
m, n 寰堟湁瑙勫緥锛屽畠浠槸鎸夋煇鍑犱釜鍥哄畾鐨勬暟闅忔満鍙栫殑锛宮 鐨勫彇鍊兼槸浠?[400, 500, 600, 700, 800, 900, 1000] 涓殢鏈洪€夊彇鐨勶紝n 鏄粠 [800, 1600, 2400, 3200, 4000] 涓殢鏈哄彇鐨勩€?/p>
鍦ㄧ粡杩囧娆$殑浠g爜杩唬浠ュ悗锛屾垜灏?cnn 鎵撳寘涓轰簡涓€涓?model锛岃繖鏍锋ā鍨嬩細绠€娲佸緢澶氾細 妯″瀷鎬濊矾鏄繖鏍风殑锛氶鍏堣緭鍏ヤ竴寮犲浘锛岀劧鍚庨€氳繃 cnn 瀵煎嚭 (112, 10, 128) 鐨勭壒寰佸浘锛屽叾涓?12灏辨槸杈撳叆鍒?rnn 鐨勫簭鍒楅暱搴︼紝10 鎸囩殑鏄瘡涓€鏉$壒寰佺殑楂樺害鏄?0鍍忕礌锛屽皢鍚庨潰 (10, 128) 鐨勭壒寰佸悎骞舵垚1280锛岀劧鍚庣粡杩囦竴涓叏杩炴帴闄嶇淮鍒?28缁达紝灏卞緱鍒颁簡 (112, 128) 鐨勭壒寰侊紝杈撳叆鍒?RNN 涓紝鐒跺悗缁忚繃涓ゅ眰鍙屽悜 GRU 杈撳嚭112涓瓧鐨勬鐜囷紝鐒跺悗鐢?CTC loss 鍘讳紭鍖栨ā鍨嬶紝寰楀埌鑳藉鍑嗙‘璇嗗埆瀛楃搴忓垪鐨勬ā鍨嬨€?/p>
CNN 鐨勭粨鏋勫涓嬪浘锛?/p>
鐞嗚鏈€澶у簭鍒楅暱搴︿负46涓瓧绗︼紙鏁板瓧鍙兘涓?00000锛屾墍浠ユ槸 CNN 鐨勭粨鏋勭敱鍘熸潵鐨勪袱灞傚嵎绉竴灞傛睜鍖栵紝鏀逛负浜嗗灞傚嵎绉紝涓€灞傛睜鍖栫殑缁撴瀯锛岀敱浜庡嵎绉眰鍒嗗埆鏄?锛?鍜?灞傦紝鎴戠О涔嬩负 346 缁撴瀯銆?/p>
涓轰粈涔堜娇鐢?RNN 鍛紝杩欓噷鎴戜妇涓€涓緢缁忓吀鐨勪緥瀛愶細鐮旇〃绌舵槑锛屾眽瀛楃殑搴忛『骞朵笉瀹氫竴鑳藉奖闃呭搷璇伙紝姣斿褰撲綘鐪嬪畬杩欏彞璇濆悗锛屾墠鍙戣繖鐜伴噷鐨勫瓧鍏ㄦ槸閮戒贡鐨勩€?/p>
浜虹溂鍘婚槄璇讳竴娈佃瘽鐨勬椂鍊欙紝鏄細椤惧強鍒颁笂涓嬫枃鐨勶紝涓嶆槸渚濇鍗曚釜瀛楃鐨勮瘑鍒紝鍥犳寮曞叆 RNN 鍘昏瘑鍒笂涓嬫枃鑳藉鏋佸ぇ鎻愬崌妯″瀷鐨勫噯纭巼銆傚湪鍐宠禌涓紝搴忓垪鏈夊嚑涓湴鏂归兘鏄湁涓婁笅鏂囧叧绯荤殑锛?/p>
鍓嶉潰涓€涓垨涓や釜璧嬪€煎紡涓€瀹氭槸 涓枃=鏁板瓧; 杩欐牱鐨勫舰寮?/p> 宸︽嫭鍙蜂竴瀹氫細鏈夊彸鎷彿 鎷彿鐨勪綅缃槸鏈夎娉曡鍒欑殑 涓€瀹氫細鏈変竴涓垎寮?/p> 鍒嗗紡鐨勫垎瀛愪竴瀹氭槸涓枃 濡傛灉鍙湁涓€涓祴鍊煎紡锛岄偅涔堣〃杈惧紡涓殑涓枃涓€瀹氭槸璧嬪€煎紡鐨勪腑鏂?/p> 濡傛灉鏈変袱涓祴鍊煎紡锛岃祴鍊煎紡瀹规槗鐪嬫竻锛岃〃杈惧紡涓嶅鏄撶湅娓咃紝閭d箞鍙互閫氳繃璧嬪€煎紡鐨勪腑鏂囧幓淇琛ㄨ揪寮忕殑涓枃锛岀壒鍒槸鍒嗗瓙涓枃琚鎺夌殑鏃跺€?/p> 鐩告瘮涔嬪墠鍒濊禌鐨勬ā鍨嬶紝杩欓噷杩涜浜嗕竴浜涗慨鏀癸細 padding 鍙樹负浜?same锛屼笉鐒舵垜瑙夊緱鐗瑰緛鍥剧殑楂樺害涓嶅锛屾棤娉曡瘑鍒垎鏁?/p> 澧炲姞浜?l2 姝e垯鍖栵紝loss loss 鍙樺緱鏇村ぇ浜嗭紝浣嗘槸鍑嗙‘鐜囧彉寰楁洿楂樹簡锛堟坊鍔?l2 鐨勯儴鍒嗗寘鎷嵎绉眰鐨?kernel锛孊N 灞傜殑 gamma 鍜?beta锛屼互鍙婂叏杩炴帴灞傜殑 weights 鍜?bias锛?/p> 鍚勪釜灞傜殑鍒濆鍖栧彉涓轰簡 he_uniform锛屾晥鏋滄瘮涔嬪墠濂?/p> 鍘绘帀浜?dropout锛屼笉娓呮褰卞搷濡備綍锛屼絾鏄弽姝f湁鐢熸垚鍣紝搴旇涓嶄細鍑虹幇杩囨嫙鍚堢殑鎯呭喌 淇敼杩?GRU 鐨?implementation 涓?锛屽師鍥犳槸甯屾湜鏄惧崱鑳藉姞閫?GRU 鐨勯€熷害锛屼絾鏄技涔庨€熷害杩樹笉濡傝缃负0锛屼娇鐢?CPU 鏉ヨ窇锛屾墍浠ュ張鏀瑰洖鏉ヤ簡 l2 姝e垯鍖栫殑鍙傛暟鐩存帴鍙傝€冧簡 Xception 璁烘枃鐨?4.3 鑺傜粰鐨勫弬鏁帮細 Weight decay: The Inception V3 model uses a weight decay (L2 regularization) rate of 4e-5, which has been carefully tuned for performance on ImageNet. We found this rate to be quite suboptimal for Xception and instead settled for 1e-5. 涓轰簡寰楀埌鏇村鐨勬暟鎹紝鎻愰珮妯″瀷鐨勬硾鍖栬兘鍔涳紝鎴戜娇鐢ㄤ簡涓€绉嶅緢绠€鍗曠殑鏁版嵁鎵╁厖鍔炴硶锛岄偅灏辨槸鏍规嵁琛ㄨ揪寮忎腑鐨勪腑鏂囬殢鏈烘寫閫夎祴鍊煎紡锛岀粍鎴愭柊鐨勬牱鏈€傝繖閲屾垜浠彇浜嗗墠 350*256=89600 涓牱鏈潵鐢熸垚锛岀敤涔嬪悗鐨?10240 涓牱鏈潵鍋氶獙璇侀泦锛岃繕鏈変竴鐐归浂澶村洜涓哄お灏戝氨娌℃湁鐢ㄤ簡銆?/p>
瀵煎叆鏁版嵁鐨勬椂鍊欙紝鍏堣鍙栬繍绠楀紡鐨勫浘鍍忥紝鐒跺悗鎸変腑鏂囧鍏ヨ祴鍊煎紡鐨勫浘鍍忓埌瀛楀吀涓€傚洜涓哄瓧鍏镐腑鐨?key 鏄棤搴忕殑锛屾墍浠ユ垜浠湪瀛楀吀涓瓨鐨勬槸 list锛屽垪琛ㄦ槸鏈夊簭鐨勩€?/p>
鐒跺悗瀹炵幇鐢熸垚鍣紝杩欓噷缁ф壙浜?keras 閲岀殑 Sequence 绫伙細 棣栧厛闅忔満鍙栦竴涓〃杈惧紡锛岀劧鍚庣敤姝e垯琛ㄨ揪寮忔壘閲岄潰鐨勪腑鏂囷紝鍐嶄粠{涓枃锛氬浘鍍忔暟缁剗鐨勫瓧鍏镐腑闅忔満鍙栧浘鍍忥紝缁忚繃涔嬪墠棰勫鐞嗙殑鏂瑰紡鎷兼帴鎴愪竴涓柊鐨勫簭鍒椼€?/p>
姣斿闅忔満鍙栦簡涓€涓?nbsp; 鍙互鐪嬪埌鑳屾櫙棰滆壊鏄笉鍚岀殑锛屼絾鏄苟涓嶅奖鍝嶆ā鍨嬪幓璇嗗埆銆?/p>
鎴戜滑璁粌鐨勭瓥鐣ユ槸鍏堢敤 Adam() 榛樿鐨勫涔犵巼 1e-3 蹇€熸敹鏁?0浠o紝鐒跺悗鐢?Adam(1e-4) 璺?0浠o紝杈惧埌涓€涓笉閿欑殑 loss锛屾渶鍚庣敤 Adam(1e-5)寰皟50浠o紝姣忎竴浠i兘淇濆瓨鏉冨€硷紝骞朵笖鎶婇獙璇侀泦鐨勫噯纭巼璺戝嚭鏉ャ€傚浘涓殑缁胯壊鐨勭嚎 0.9977 灏辨槸鎸変笂闈㈢殑鏂规硶璁粌鐨勬ā鍨嬶紝 褰撶劧鎴戜滑杩樺皾璇曡繃鍏堟寜 1e-3 鐨勫涔犵巼璁粌20浠o紝鐒跺悗 1e-4 鍜?1e-5 浜ゆ浛璁粌2娆★紝姣忔璁粌鍙栭獙璇侀泦 loss 鏈€浣庣殑缁撴灉缁х画璁粌锛屼篃灏辨槸鍥句腑绾㈣壊鐨勭嚎锛岃櫧鐒堕€熷害蹇紝浣嗘槸鍑嗙‘鐜囦笉澶熷ソ銆?/p>
涔嬪悗鎴戜滑灏嗗叏閮ㄨ缁冮泦閮界敤浜庤缁冿紝寰楀埌浜嗚摑鑹茬殑绾匡紝鏁堟灉鍜岀豢鑹插樊涓嶅銆?/p>
璇诲彇娴嬭瘯闆嗙殑鏍锋湰锛岀劧鍚庣敤 杈撳嚭鍒版枃浠剁殑閮ㄥ垎鏈変竴鐐瑰€煎緱涓€鎻愶紝灏辨槸濡備綍璁$畻鍑虹湡瀹炲€硷細 鍏朵腑鐨勬€濊矾璇磋捣鏉ヤ篃寰堢畝鍗曪紝灏辨槸灏嗚〃杈惧紡涓殑璧嬪€煎紡涓枃鏇挎崲涓鸿祴鍊煎紡鐨勬暟瀛楋紝鐒跺悗鐩存帴鐢?python eval 寰楀埌缁撴灉锛岀畻涓嶅嚭鏉ョ殑鐩存帴鐣欑┖鍗冲彲銆傝繖涓?.9977妯″瀷鐨勫彲绠楃巼杈惧埌浜?.99978锛屼篃灏辨槸璇村崄涓囦釜鏍锋湰閲岄潰鍙湁22涓牱鏈笉鍙畻锛屽綋鐒讹紝瀹為檯涓婅繕鏄湁涓€浜涙牱鏈嵆浣垮彲绠楋紝涔熶細鍥犱负鍚勭鍘熷洜璇嗗埆閿欙紝姣斿5鍜?灏辨槸閿欒鐨勯噸鐏惧尯锛屾煇浜涙暟瀛楄骞叉壈绾垮垏杩囷紝瀵艰嚧鑲夌溂閮借鲸璁や笉娓呯瓑銆?/p>
妯″瀷缁撴灉铻嶅悎鐨勮鍒欏緢绠€鍗曪紝瀵规墍鏈夌殑缁撴灉杩涜娆℃暟缁熻锛屽厛鍘绘帀绌虹殑缁撴灉锛岀劧鍚庡彇鏈€楂樻鏁扮殑缁撴灉鍗冲彲锛屽叾瀹炲氨鏄畝鍗曠殑鎶曠エ銆?/p>
灏嗕笂闈?loss 鍥句腑鐨勪笁涓ā鍨嬬粨鏋滆瀺鍚堜互鍚庯紝鏈€鍚庡緱鍒颁簡0.99868鐨勬祴璇曢泦鍑嗙‘鐜囥€?/p>
鍦ㄦ瘮璧涘垰寮€濮嬬殑鏃跺€欙紝灏濊瘯杩囧皢鍥惧儚鐨勫搴﹁缃负 None锛屼篃灏辨槸涓嶅畾闀跨殑瀹藉害锛屼絾鏄敱浜庢棤娉曡В鍐?reshape 鐨勯棶棰橈紝杩欎釜鏂规琚惁浜嗐€?/p>
涔嬪墠灏濊瘯杩囧浘鍍忓垏鎴愬嚑鍧楋紝鍒嗗埆璇嗗埆锛岃祴鍊煎紡鍜岃〃杈惧紡鐨勬ā鍨嬪垎寮€锛岃€冭檻鍒扮敱浜庢棤娉曞緱鍒颁笂涓嬫枃鐨勪俊鎭紝鍙兘浼氫涪澶变竴瀹氱殑鍑嗙‘鐜囷紝鍋氬埌涓€鍗婂惁鎺変簡杩欎釜鏂规銆?/p>
鎴戜滑灏濊瘯杩囧啓涓€涓敓鎴愬櫒锛屼絾鏄敱浜庡拰瀹樻柟缁欑殑鍥惧儚宸お杩滐紝骞朵笖瀹為檯娴嬭瘯鐨勬椂鍊欒涔堟槸鐢熸垚鐨勫噯纭巼楂橈紝瀹樻柟鐨勫噯纭巼浣庯紝瑕佷箞鍙嶈繃鏉ワ紝鎵€浠ユ病鏈夋姇鍏ヤ娇鐢ㄣ€?/p>
涓婂浘绗竴涓槸瀹樻柟鐨勫浘鍍忥紝鍚庨潰浜斾釜鏄垜浠殑鐢熸垚鍣ㄧ敓鎴愮殑锛屽彲浠ョ湅鍒版垜浠殑瀛楁病鏈夊畼鏂圭殑绱у噾锛岀瓑鍙蜂篃涓嶅お涓€鏍凤紝鍒嗗紡鎴戜滑鐨勫瓧鍙堝お绱у噾浜嗐€?/p>
闄や簡鑷繁鎼ā鍨嬶紝鎴戣繕灏濊瘯杩囩敤 ResNet锛孌enseNet 鏇挎崲 CNN锛岀劧鍚庡幓璁粌锛屼絾鏄敱浜庢湰韬繖浜涙ā鍨嬪氨寰堝ぇ锛岃缁冭捣鏉ラ€熷害寰堟參锛岀劧鍚庝富瑕侀棶棰樺張涓嶅湪妯″瀷涓嶅澶嶆潅锛屽洜涓轰粠缁樺埗鍑烘潵鐨?loss 鏇茬嚎鏉ョ湅锛岃櫧鐒跺墠闈㈢殑 val_loss 涓€鐩村湪鎶栵紝浣嗘槸鍦ㄧ50浠e涔犵巼涓嬮檷浠ュ悗灏遍潪甯稿钩缂撲簡锛岃繖妯″瀷鏄病鏈夎繃鎷熷悎鐨勶細 鍦ㄦ瘮璧涙渶鍚庡皾璇曡繃灏?GRU 鏇挎崲涓?LSTM锛屽緱鍒扮殑缁撴灉鏄崄鍒嗙被浼肩殑锛屼絾鏄彁浜や笂鍘讳互鍚庡噯纭巼鏈夎交寰笅闄嶏紙澶氶敊浜嗗嚑涓牱鏈紝鍙兘鏄繍姘旈棶棰橈級锛屼箣鍓嶅仛楠岃瘉鐮佽瘑鍒殑鏃跺€欎篃鏄浛鎹㈣繃锛屾晥鏋滃樊涓嶅锛屽洜姝ゆ病鏈夌户缁皾璇曘€傜悊璁轰笂杩欎釜搴忓垪闀垮害骞舵病鏈夊緢闀匡紝GRU 鍜?LSTM 褰卞搷涓嶅ぇ銆?/p>
鏈」鐩腑锛岄渶瑕佹敞鎰忎互涓嬪嚑涓噸瑕佺殑鐐癸細 鏁版嵁鍑嗗锛?/p> 娣卞害瀛︿範鍚屼紶缁熷浘鍍忓鐞嗘妧鏈粨鍚堬紝鍙互杈惧埌鏇村ソ鐨勫噯纭巼 鏂囨湰璇嗗埆鍙互鏋勯€犻獙璇佺爜鐢熸垚鍣ㄨ繘琛屾暟鎹寮猴紝澧炲姞璁粌鏍锋湰鏁?/p> 妯″瀷浼樺寲锛?/p> 濡備綍鏍规嵁椤圭洰鐗圭偣锛屽妯″瀷缁撴瀯杩涜璋冩暣锛屽CNN 閮ㄥ垎鍑忓皯姹犲寲灞備娇鐢紝绛夌瓑 涓轰簡闃叉杩囨嫙鍚堬紝鍦ㄦā鍨嬩腑寮曞叆 L2 姝e垯鍖?/p> 妯″瀷璁粌锛?/p> 浣跨敤瀛︿範鐜囪“鍑忕瓥鐣ワ紝璁粌妯″瀷 瀵瑰鏉傜殑妯″瀷锛屽彲浠ュ皢鍚屼竴鎵规杈撳叆鏁版嵁鍒嗘憡缁欏涓狦PU杩涜璁$畻銆?/p> 涓嶆柇鏇存柊璧勬簮 娣卞害瀛︿範銆佹満鍣ㄥ涔犮€佹暟鎹垎鏋愩€乸ython 以上是关于鐧惧害娣卞害瀛︿範鍥惧儚璇嗗埆鍐宠禌浠g爜鍒嗕韩(OCR)的主要内容,如果未能解决你的问题,请参考以下文章
鍒濊禌鍐呭锛氫粠鍥剧墖涓瘑鍒洓鍒欒繍绠楀紡锛岀畻寮忓彲鑳藉寘鍚暟瀛?~9銆佽繍绠楃+-*銆佹嫭鍙?)銆傚苟涓旓紝绠楀紡鐨勯暱搴﹀浐瀹氫负5鎴?锛屽寘鍚笁涓暟瀛楋紝涓や釜杩愮畻绗︼紝0鎴?瀵规嫭鍙枫€備笅闈㈡槸鍑犱釜鏍蜂緥锛?/p>
if c != '-':
im = im.resize((w2, h2))
im = im.transform((w, h), Image.QUAD, data)
鏁版嵁闆?/h3>
璇勪环鎸囨爣
鏁版嵁鐨勬帰绱?/h2>
瀹氫箟
娴?42072;鍦?86;(鍦?(97510*45921))*娴?35864
+-*/
琚О涓鸿繍绠楃銆?/p>
鍒嗘瀽
1+1+1+1
(1+1)+1+1
1+(1+1)+1
1+1+(1+1)
(1+1+1)+1
1+(1+1+1)
((1+1)+1)+1
(1+(1+1))+1
1+((1+1)+1)
1+(1+(1+1))
(1+1)+(1+1)
鎬荤粨
鏁版嵁棰勫鐞?/h2>
def plot(index):
img = cv2.imread('%s/%d.png'%(IMAGE_DIR, index))
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
eq = cv2.equalizeHist(gray)
b = cv2.medianBlur(eq, 9)
m, n = img.shape[:2]
b2 = cv2.resize(b, (n//4, m//4))
m1 = cv2.morphologyEx(b2, cv2.MORPH_OPEN, np.ones((7, 40)))
m2 = cv2.morphologyEx(m1, cv2.MORPH_CLOSE, np.ones((4, 4)))
_, bw = cv2.threshold(m2, 127, 255, cv2.THRESH_BINARY_INV)
bw = cv2.resize(bw, (n, m))
r = img.copy()
img2, ctrs, hier = cv2.findContours(bw, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE) for ctr in ctrs:
x, y, w, h = cv2.boundingRect(ctr)
cv2.rectangle(r, (x, y), (x+w, y+h), (0, 255, 0), 10)
鍘诲櫔
杩炴帴鍏紡
鍏抽敭鍖哄煙鎻愬彇
寰皟
# 寰皟涓変釜鍏紡d = 20d2 = 5imgs = []
sizes = []for i, ctr in enumerate(ctrs):
x, y, w, h = cv2.boundingRect(ctr)
roi = img[max(0, y-d):min(m, y+h+d),max(0, x-d):min(n, x+w+d)]
p, q, _ = roi.shape
x = b[max(0, y-d):min(m, y+h+d),max(0, x-d):min(n, x+w+d)]
x = cv2.morphologyEx(x, cv2.MORPH_CLOSE, np.ones((3, 3)))
_, x = cv2.threshold(x, 127, 255, cv2.THRESH_BINARY_INV)
_, x, _ = cv2.findContours(x, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
x, y, w, h = cv2.boundingRect(np.vstack(x))
roi2 = roi[max(0, y-d2):min(p, y+h+d2),max(0, x-d2):min(q, x+w+d2)]
imgs.append(roi2)
sizes.append(roi2.shape)
杩炴帴涓変釜鍏紡
# 杩炴帴涓変釜鍏紡sizes = np.array(sizes)
img2 = np.zeros((sizes[:,0].max(), sizes[:,1].sum()+2*(len(sizes)-1), 3), dtype=np.uint8)
x = 0for a in imgs[::-1]:
w = a.shape[1]
img2[:a.shape[0], x:x+w] = a
x += w + 2
骞惰棰勫鐞?/h3>
p = Pool(12)
n = 100000if __name__ == '__main__':
rs = [] for r in tqdm(p.imap_unordered(f, range(n)),
total=n):
rs.append(r)
鎬荤粨
pd.plotting.scatter_matrix(df, alpha=0.1, figsize=(14,8),
diagonal='kde');Counter(df['m'])
Counter({400: 14233, 500: 14414, 600: 14332, 700: 14304, 800: 14293, 900: 14299, 1000: 14125})
Counter(df['n'])
Counter({800: 19872, 1600: 19937, 2400: 20128, 3200: 19975, 4000: 20088})
妯″瀷缁撴瀯
CNN
2*9+3*6+4+4+2=46
锛屽浜?CTC 鏉ヨ锛屾垜浠渶濂借杈撳叆澶т簬鏈€澶ч暱搴?鍊嶇殑搴忓垪锛屾墠鑳芥敹鏁涘緱姣旇緝濂姐€備箣鍓嶆垜鐩存帴鍗风Н鍒?0宸﹀彸浜嗭紝鐒跺悗瀵逛簬杩炵画瀛楃鏉ヨ锛屾病鏈夌┖鐧借兘灏嗗畠浠垎闅斿紑鏉ワ紝鎵€浠ユ敹鏁涙晥鏋滀細宸緢澶氥€傝繖閲岀殑鏈€澶у簭鍒楅暱搴︽垜涔嬪墠鎬绘槸绠楅敊锛屽洜涓烘垜鐢ㄧ殑鏄?Python2锛屾病鏈?decode 鎴?utf-8 鐨勮瘽锛屼竴涓腑鏂囧崰涓変釜瀛楄妭銆?/p>
GRU
鍏朵粬鍙傛暟
鐢熸垚鍣?/h2>
from collections import defaultdict
cn_imgs = defaultdict(list)
cn_labels = defaultdict(list)
ss_imgs = []
ss_labels = []for i in tqdm(range(n1)):
ss = df[0][i].decode('utf-8').split(';')
m = len(ss)-1
ss_labels.append(ss[-1])
ss_imgs.append(cv2.imread('crop_split2/%d_%d.png'%(i, 0)).transpose(1, 0, 2)) for j in range(m):
cn_labels[ss[j][0]].append(ss[j])
cn_imgs[ss[j][0]].append(cv2.imread('crop_split2/%d_%d.png'%(i, m-j)).transpose(1, 0, 2))
from keras.utils import Sequenceclass SGen(Sequence): def __init__(self, batch_size): self.batch_size = batch_size self.X_gen = np.zeros((batch_size, width, height, 3), dtype=np.uint8) self.y_gen = np.zeros((batch_size, n_len), dtype=np.uint8) self.input_length = np.ones(batch_size)*rnn_length self.label_length = np.ones(batch_size)*38
def __len__(self): return 350*256 // self.batch_size
def __getitem__(self, idx): self.X_gen[:] = 0
for i in range(self.batch_size): try:
random_index = random.randint(0, n1-1) cls = []
ss = ss_labels[random_index]
cs = re.findall(ur'[u4e00-u9fff]', df[0][random_index].decode('utf-8').split(';')[-1])
random.shuffle(cs)
x = 0
for c in cs:
random_index2 = random.randint(0, len(cn_labels[c])-1) cls.append(cn_labels[c][random_index2])
img = cn_imgs[c][random_index2]
w, h, _ = img.shape self.X_gen[i, x:x+w, :h] = img
x += w+2
img = ss_imgs[random_index]
w, h, _ = img.shape self.X_gen[i, x:x+w, :h] = img cls.append(ss)
random_str = u';'.join(cls) self.y_gen[i,:len(random_str)] = [characters.find(x) for x in random_str] self.y_gen[i,len(random_str):] = n_class-1
self.label_length[i] = len(random_str) except: pass
return [self.X_gen, self.y_gen, self.input_length, self.label_length], np.ones(self.batch_size)
85882*(娌?76020-37023)-閾?/code>锛岀劧鍚庢垜浠粠閾佺殑璧嬪€煎紡涓殢鏈哄彇涓€涓紝鍐嶄粠娌崇殑璧嬪€煎紡涓殢渚垮彇涓€涓紝鎷艰捣鏉ュ氨鑳藉緱鍒颁笅鍥撅細
璁粌
棰勬祴缁撴灉
base_model
杩涜棰勬祴锛岃繖涓繃绋嬪緢绠€鍗曪紝灏变笉璁蹭簡銆?/p>
X = np.zeros((n, width, height, channels), dtype=np.uint8)for i in tqdm(range(n)):
img = cv2.imread('crop_split2_test/%d.png'%i).transpose(1, 0, 2)
a, b, _ = img.shape
X[i, :a, :b] = img
base_model = load_model('model_346_split2_3_%s.h5' % z)
base_model2 = make_parallel(base_model, 4)
y_pred = base_model2.predict(X, batch_size=500, verbose=1)
out = K.get_value(K.ctc_decode(y_pred[:,2:], input_length=np.ones(y_pred.shape[0])*rnn_length)[0][0])[:, :n_len]
ss = map(decode, out)
vals = []
errs = []
errsid = []for i in tqdm(range(100000)):
val = ''
try:
a = ss[i].split(';')
s = a[-1] for x in a[:-1]:
x, c = x.split('=')
s = s.replace(x, c+'.0')
val = '%.2f' % eval(s) except:# disp3(i)
errs.append(ss[i])
errsid.append(i)
ss[i] = ''
vals.append(val)
with open('result_%s.txt' % z, 'w') as f:
f.write('
'.join(map(' '.join, list(zip(ss, vals)))).encode('utf-8'))
print len(errs)print 1-len(errs)/100000.# output220.99978
妯″瀷缁撴灉铻嶅悎
import globimport numpy as npfrom collections import Counterdef fun(x):
c = Counter(x)
c[' '] = 0
return c.most_common()[0][0]
ss = [open(fname, 'r').read().split('
') for fname in glob.glob('result_model*.txt')]
s = np.array(ss).Twith open('result.txt', 'w') as f:
f.write('
'.join(map(fun, s)))
鍏朵粬灏濊瘯
涓嶅畾闀垮浘鍍忚瘑鍒?/h3>
鍒嗗埆璇嗗埆
鐢熸垚鍣ㄥ皾璇?/h3>
鍏朵粬 CNN 妯″瀷鐨勫皾璇?/h3>
鏇挎崲 GRU 涓?LSTM
鎬荤粨
瀵归」鐩殑鎬濊€?/h3>