cvpr 2016 璁烘枃瀛︿範 Video object segmentation
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了cvpr 2016 璁烘枃瀛︿範 Video object segmentation相关的知识,希望对你有一定的参考价值。
鏍囩锛?a href='http://www.mamicode.com/so/1/war' title='war'>war
閫氳繃 disco move content ide 缂栫爜 mat 鑳屾櫙
Abstract— Video object segmentation, a binary labelling
problem is vital in various applications including object tracking,
action recognition, video summarization, video editing, object
based encoding and video retrieval锛堟绱級. This paper presents an
overview of recent strategies in video object segmentation锛堝垎绫伙級,
focusing on the techniques for solving challenges like complex
and moving background, illumination锛堝厜绾?鐓ф槑锛?/strong> changes, occlusions锛堥伄鎸★級,
motion blur锛堣繍鍔ㄦā绯婏級, shadow effect and view point variation. Significant
works evolved in this research field over recent years are
categorized based on the challenges solved by the researchers. A
list of challenging datasets and evaluation metrics锛堟寚鏍囷級 available for
video object segmentation is presented. Finally, research gaps in
this domain锛堥鍩燂級 are discussed.
鎽樿 锛?瑙嗛鍒嗗壊绠楁硶锛岃繖鏄竴绉嶅湪鍚勭棰嗗煙涓紝姣斿瀵硅薄璺熻釜锛屽姩浣滆瘑鍒紝瑙嗛鍒嗙被锛岃棰戝壀杈戯紝鍩轰簬缂栫爜鍜岃棰戞绱㈢殑閲嶈鏂规硶銆?/div>
杩欑瘒璁烘枃涓昏鍛堢幇鐨勬槸鏈€杩戣棰戝垎鍓茬畻娉曚腑鐨勭瓥鐣ワ紝瀹冧富瑕佽В鍐充簡澶嶆潅鐨勭Щ鍔ㄨ儗鏅紝鐓ф槑鍙樺寲锛岄伄鎸$墿鍝侊紝杩愬姩妯$硦锛岄槾褰辨晥鏋滐紝鍜岃瑙掔殑鍙樺寲鐨勯棶棰樸€?/div>
鏈鏂囦腑娑夊強鍒扮殑閲嶈宸ヤ綔鍜屾寫鎴橀兘琚垎濂界被浜嗭紝鍚屾椂锛屼粬涔熸彁渚涗簡瑕佸涔犵殑鏁版嵁闆嗗拰璇勪环鎸囨爣锛屾渶鍚庯紝浠栦篃璁ㄨ浜嗚繖鍧楅鍩熺殑鍒嗘鐐广€?/div>
Recent internet world is engaged with massive amount of
video data thanks to the development in storage devices and
handy imaging systems. Huge terabyte锛堝厗瀛楄妭锛?/strong> of video are regularly
generated for various useful applications like surveillance锛堢洃鎺э級,
news broadcasting, telemedicine, etc. Based on the
information provided by CISCO on ‘Visual Networking Index
(VNI)’, the growth of internet video traffic will be three fold
from 2015 to 2020. Manually extracting semantic锛堣涔夛級 information
from this enormous amount of internet video is highly
unfeasible, seeking the need for automated methods to
annotate锛堟敞閲婏級/derive锛堝鍑猴級 useful information from the video data for
video management and retrieval [1]. Hence, one of the
essential steps for video processing and retrieval is video
object segmentation, a binary labelling problem for
differentiating the foreground objects accurately from the
background. Video object segmentation aims at partitioning锛堝垎绂伙級
every frame锛堝抚锛?/strong> in a video into meaningful objects by grouping
the pixels along spatio-temporal锛堟椂绌虹殑锛?/strong> direction that exhibit
coherency in appearance and motion [2]. Video object
segmentation task is highly challenging due to the following
reasons: (i) unknown number of objects in a video (ii) varying
background in a video and (iii) occurrence of multiple objects
in a video [3]. Existing approaches in video segmentation can
be broadly classified into two categories viz, interactive锛堜氦浜掑紡锛?/div>
method and unsupervised锛堟棤鐩戠潱鐨勶級 method. Interaction objects
segmentation method human intervention in initialization
process while unsupervised approaches can perform object
segmentation automatically. In Semi supervised approaches锛?
user intervention is required for annotating initial frames and
these annotations are transferred to the entire frames in the
video. Automated object segmentation approaches [7][8][9]
can segment any video data into meaningful objects without
user interaction based on object proposals and motion cues
from the video. The common assumption followed by most of
the automated methods is that only single object is moving
throughout the video and use only the motion information for
segmenting the object from the background. This assumption
will lead to poor segmentation under discontinuous motion of
object [11]. Referring the literature [12] [13] [14] [15] for
survey on video object segmentation which describes the
techniques available for image segmentation, not to video
data. In [15] authors classified the approaches in video
segmentation as inference and feature modes. The
segmentation techniques propose so far to improve the
segmentation results are grouped as inference modes and
methods that depend on features like depth, motion and
histogram are termed as feature modes. From this observation,
it is evident that none of the researchers have discussed the
segmentation approaches from the perspective of the
challenges solved by the algorithm. Hence this paper
categorizes the significant work contributed by researchers in
video object segmentation based on the issues resolved by the
respective authors. Several issues degrading the segmentation
performance are moving back ground, moving camera,
illumination variation, occlusion, shadow effect, viewpoint
variation, etc. Moreover the proposed algorithm should
provide tradeoff between segmentation accuracy and
complexity. As depicted in fig. 1, this paper classifies the
video object segmentation task as:
1. Issue tackling mode
2. Complexity reduction mode and
3. Inference mode
The main contributions of this paper are:
x Summarizing the recent activities in video object
segmentation domain.
x Categorizing the significant works in this research
field meaningfully and
x Presenting a list of database and evaluation metrics
needed for developing an efficient video object
segmentation framework.
Organization of this paper: Section II describes the algorithms
contributed significantly in tackling the issues (discussed
earlier) involved in video object segmentation. Section III
presents an overview on segmentation approaches with
reduced complexity available in literature. Section IV provides
a gist on object segmentation techniques that fall under
inference mode. Section V lists the dataset and the evaluation
metrics used in these segmentation approaches and discusses
about research gaps in video object segmentation field.
Section IV concludes this study.
鏈€杩戯紝鎴戜滑鐨勭綉缁滀笘鐣屽厖鏂ョ潃鍚勭鍚勬牱鐨勮棰戜俊鎭€傘€傘€傘€傘€傛€讳箣鐢ㄤ簡涓€澶ф璇濆憡璇変綘杩欏緢閲嶈鍟︼紝鐒跺悗灏辨槸璇达紝鎴戜滑鐨勭洰鐨勶紙瑙嗛鍒嗗壊绠楁硶锛夋槸鍒嗙瑙嗛涓殑姣忎竴甯э紝浠庤€屽睍绀哄嚭瑙嗛涓殑鐗╀綋鐨勫姩闈欏拰璋愶紙浼拌鏄竴绉嶄竴鑷存€э級锛屽彲鏄紝瑙嗛鍒嗗壊鏈変互涓嬮毦鐐癸細
1.涓嶇煡閬撹棰戜腑鏈夊灏戠洰鏍囧璞?/strong>
2.澶氬彉鐨勮儗鏅?/strong>
3.澶氶噸鐩爣瀵硅薄
鐜板湪涓昏鏄袱绉嶆柟娉曪細浜や簰寮忥紝鏃犵洃鐫f柟娉曘€傚綋鐒舵垜浠繖绡囨枃绔犺偗瀹氭槸鏃犵洃鐫f柟娉曪紝鑷姩鍒嗗壊瑙嗛瀵硅薄銆?/div>
鑰屽湪鍗婄洃鐫e涔犳柟娉曚腑锛屽垵濮嬪寲涓€寮€濮嬬殑甯э紝鍜屾垜浠鍒嗗壊鐨勫璞¤偗瀹氭槸蹇呰鐨勶紝浣嗘槸鏃犵洃鐫f柟娉曚笉闇€瑕佽繖涓€鐐癸紝鐜板湪瀛樺湪鐨勫緢澶氳棰戝垎鍓茬畻娉曢兘鏄亣璁惧彧鏈夊崟鐙殑鐗╀綋瀵硅薄鍦ㄧЩ鍔紝浣嗘槸闈㈠涓嶈繛缁Щ鍔ㄧ殑瀵硅薄鏃跺€欙紝浼氬鑷翠笉鑹殑鏁堟灉銆傝€屽湪寮曟枃15涓殑浣滆€呰涓鸿棰戝垎鍓茬殑绠楁硶搴旇鏄熀浜?ldquo;鐗瑰緛鎻愬彇鍜屾帹鏂?/strong>”锛岀储寮?2-15鏄叧浜庡浘鍍忓垎鍓茬殑鏂规硶鎬荤粨銆傚熀浜庣洰鍓嶇殑瑙傚療锛屾樉鐒跺緢灏戞湁浜轰粠绠楁硶瑙掑害鎬荤粨浜嗚棰戝垎鍓茬殑鏂规硶锛屽洜姝よ繖绡囨枃绔犳€荤粨浜嗗湪涓€浜涘彲鑳戒細闄嶄綆瑙嗛鍒嗗壊鍑嗙‘搴︾殑涓€浜涢棶棰橈紝姣斿璇寸Щ鍔ㄧ殑鑳屾櫙锛岀Щ鍔ㄧ殑鐓х浉鏈猴紝鍏夌嚎鍙樺寲锛岄伄鎸$墿锛岄槾褰辨晥鏋滐紝瑙嗚鍙樺寲銆?/div>
鏇磋繘涓€姝ワ紝杩欑瘒鏂囩珷鎻愬嚭鐨勭畻娉曞皢浼氭潈琛$畻娉曠殑澶嶆潅搴﹀拰鍑嗙‘搴︿箣闂寸殑鑰冮噺銆?/div>
鎵€浠ユ湰绡囨枃绔犵殑鏋舵瀯鏄?
1.瑙e喅澶勭悊鏂瑰紡
2.澶嶆潅搴﹂檷浣?/div>
3.骞叉壈鐨勫垎鏋?/div>
鏈瘒鏂囩珷涓昏鐨勪笁涓础鐚細1.鎬荤粨浜嗘渶杩戣繖涓鍩熶腑鐨勫伐浣?2.灏嗙洰鍓嶆墍鐢ㄥ埌鐨勬柟娉曞垎绫?.鎻愪緵浜嗕竴浜涙暟鎹泦鍜屾暟鎹爣鍑嗕緵闃呰鑰呯粌涔犮€?/div>
II. ISSUE TACKLING MODE
This section details about ‘issue tackling mode’, first
category of the video object segmentation approach. Though
several issues (as discussed earlier) affect the performance of
the segmentation approaches, commonly occurring problems
are moving background, occlusion, shadow, rain , moving
camera, illumination and view point variation.
A. Surveillance video systems
The traffic surveillance systems include detection and
recognition of moving vehicles (objects) from traffic video
sequence. For any traffic surveillance system, vehicle
segmentation is the fundamental step and base for tracking the
vehicle movements. But, Vehicle segmentation in traffic
video is still challenging due to the moving objects and
illumination variations. To solve this issue, an unsupervised
neural network based background modelling has been
proposed for real time objects segmentation. In this work,
neural network serves as both adaptive model of the
background in a video sequence and a classifier of pixels as
background/foreground. The segmentation time taken by the
neural network is improved by implementing it in FPGA kit.
Though this neural network based background subtraction
method achieves good segmentation accuracy, it works well
only under slightly varying illumination and moving
background. A high cost is involved in reducing time
complexity [16]. Followed by this, [17]Appiah et. Al proposed
an integrated hardware implementation of moving object
segmentation in real time video stream under varying lighting
conditions. Two algorithms for multimodal background
modelling and connected component analysis is implemented
on a single chip FPGA. This method segments objects under
varying illumination condition at high processing speed. The
two algorithms described so far do not take raining issue into
account. Under raining situation, shadows and colour
reflections are the major problems to be tackled. A
conventional video object segmentation algorithm that
combines the background construction-based video object
segmentation and the foreground extraction-based video
object segmentation has been proposed. The foreground is
separated from the background using histogram-based change
detection technique and object regions are segmented
accurately by detecting the initial moving object masks based
on a frame difference mask. Shadow and colour reflection
regions are removed by diamond window mask and colour
analysis of moving object respectively. Segmentation of
moving objects are refined by morphological operations. The
segmentation results of moving objects under rainy situations.In the future, we will adaptively
obtain the threshold and adjustthe content of the video
automatically. Later, Chien et al [19] proposed a video object
segmentation and tracking technique for smart cameras in
visual surveillance networks. A multi-background model
based on threshold decision algorithm for video object
segmentation under drastic changes in illumination and
background clutter has been developed. In this method, the
threshold is selected robustly without user requirement and it
is different from per pixel background model which avoids
possible error propagations. Another algorithm for extracting
objects from videos captured by static camera has been
proposed to solve issues like waving tree, camouflage region
and sleeping is also proposed [20]. In this method, reference
background is obtained by averaging of some initial frames.
Temporal processing for object extraction do not consider
spatial correlation amongst the moving objects across frames.
Hence, an approximate motion field is derived using the
background subtraction and temporal difference mechanism.
The background model adapts temporal changes (swaying
trees, rippling water, etc) which extract the complementary
object in the scene.
using [18] is shown in fig.2.瀵逛簬浜ら€氭娴嬫潵璇达紝鏈€閲嶈鐨勫氨鏄垎绫诲悇绉嶅悇鏍风殑浜ら€氬伐鍏凤紝浣嗘槸锛屽洜涓虹墿浣撴€绘槸鍦ㄨ繍鍔ㄧ殑鍘熷洜锛屾墍浠ヨ繕鏄緢闅捐瘑鍒€傛墍浠ヤ负浜嗚В鍐宠繖涓棶棰橈紝涓€涓棤鐩戠潱鐨勭缁忕綉缁滆鎴戜滑鐢ㄦ潵浣滀负瑙嗛涓墠鏅壊鍜岃儗鏅壊鐨勯€傚簲鎬фā鍨嬪拰鍍忕礌鐨勫垎绫诲櫒銆傝繖涓缁忕綉缁滅殑杩愮畻鏃堕棿鍙互鎶婁粬瑁呭湪fgpa涓婃潵鍑忓皯锛岃櫧鐒惰繖涓缁忕綉缁滀綔涓?ldquo;绛涢櫎鑳屾櫙”鐨勬柟娉曞彇寰椾簡寰堥珮鐨勫垎绫绘晥鏋滐紝浣嗘槸鍙兘鍦ㄥ厜褰卞彉鍖栦笉澶у拰鑳屾櫙鍑犱箮涓嶅姩鐨勬儏鍐典笅浣跨敤锛屽悓鏃讹紝鍑忓皯鏃堕棿澶嶆潅搴︾殑鎴愭湰寰堥珮鐨勶紝鎵€浠ワ紝Appiah et. Al鎻愬嚭浜嗕竴绉嶅彲鍦ㄩ泦鎴愮‖浠朵笂瀹炶鐨勭畻娉曪紝杩欎袱绉嶇畻娉曞湪鍗曟牳fgpa涓婂氨鑳藉疄鐜帮紝鑰屼笖瀹冨緢濂藉湴瑙e喅浜嗗厜绾块棶棰樸€備絾鏄粬娌℃湁瑙e喅闆ㄥぉ鐨勯棶棰橈紝鍦ㄩ洦澶╋紝闃村奖鍜屽厜绾跨殑鍙嶅皠鏄渶涓昏鐨勯棶棰樸€備紶缁熺殑绠楁硶灏嗗熀浜庢灦鏋勭殑鑳屾櫙鍒嗙被鍜屽垎绂诲嚭鐨勫墠鏅墿浣撴贩鍚堝湪涓€璧枫€傝€屽墠鏅簲璇ュ埄鐢ㄥ熀浜?ldquo;鐩存柟鍥?rdquo;鏀瑰彉鐨勪睛娴嬫妧鏈紝鐩爣鍖哄煙涔熷簲璇ヨ鍒嗗壊鍑烘潵锛屾柟娉曟槸渚︽祴鏈€鍒濈殑绉诲姩鐗╀綋鍩轰簬绉诲姩鐗╀綋鐨勬帺妯″拰甯у樊寮傜殑鎺╃爜涓婏紙杩欐槸浠€涔堟剰鎬濓紝鐩墠娌℃悶鏄庣櫧锛?/span>銆傚弽姝d粬璇撮槾褰卞拰棰滆壊鍙嶅皠鐨勯儴鍒嗕細琚竴涓猟iamond window mask鍜岄鑹插垎鏋愮Щ鍔ㄧ殑鐩爣绠楁硶鍒嗗埆鏉ュ鐞嗐€傝繖鏄竴绉嶅垎褰㈠嚑浣曠殑绠楁硶锛岀Щ鍔ㄤ腑闇€瑕佸垎鍓茬殑鐗╀綋琚繖绉嶇畻娉曠粰闄愬埗浜嗭紝鍦╢ig2涓粨鏋滆鍛堢幇浜嗗嚭鏉ャ€傚湪鏈潵锛屾垜浠绠楁硶鑷姩閫傚簲鎬у湴璋冩暣“闃堝€?rdquo;鍜?ldquo;璋冩暣鐨勫唴瀹?rdquo;銆備箣鍚庯紝chien鎻愬嚭浜嗕竴绉嶅灏忓瀷鐓х浉鏈虹殑瑙嗚绁炵粡缃戠粶鐨勭畻娉曪紝鍦ㄨ繖绉嶆柟娉曚腑锛?ldquo;闃堝€?rdquo;涓嶉渶瑕佷娇鐢ㄨ€呯殑甯姪灏辫兘椴佹鍦扮粰鍑猴紝鑰屼笖瀹冧笌閫愬儚绱犵殑绠楁硶杩樹笉鍚岋紝閬垮厤浜嗗彲鑳界殑閿欒瀹d紶锛燂紵锛堝暐鎰忔€濓紝涓嶆噦锛夈€傝繕鏈変竴绉嶇畻娉曟槸浣跨敤闈欐€佺浉鏈虹殑锛屼笓闂ㄧ敤鏉ユ崟鎹夋鍦ㄦ憞鏅冪殑鏍戯紝杩樻湁涓€浜涗吉瑁呯殑涓滆タ銆傚垵濮嬪寲鏃跺埄鐢ㄤ竴浜涘垵濮嬪抚鐨勫钩鍧囧€硷紙浠€涔堟剰鎬濓紵锛燂級锛屼絾鏄繖绉嶇畻娉曟病鏈夎€冭檻绌洪棿鐩稿叧鎬э紝灏ゅ叾鏄偅浜涢€愬抚绉诲姩鐨勭墿浣擄紵锛熴€?/span>
涓汉缁撳悎杩欑绠楁硶鐨勯偅寮犳晥鏋滃浘锛岃寰楀氨鏄彲浠ヨ繃婊ゆ帀鍏夊奖鏁堟灉锛屼粎鐣欏瓨鐪熸鐨勭洰鏍囥€?/span>
鏈€鍚庝竴娈佃瘽鐪熺殑鐪嬩笉鎳備簡锛屾墍浠ュ氨鐩存帴璋锋瓕缈昏瘧浜嗭紵锛燂紵锛?/span>
鍥犳锛屼娇鐢ㄨ鎺ㄥ鍑鸿繎浼艰繍鍔ㄥ満
鑳屾櫙鍑忔硶鍜屾椂闂村樊鍒嗘満鍒躲€?br />鑳屾櫙妯″瀷閫傚簲鏃堕棿鍙樺寲锛堟憞鎽?br />鏍戞湪锛屾稛婕按绛夛級鎻愬彇浜掕ˉ
鍦烘櫙涓殑鐗╀綋銆?/span>锛燂紵锛燂紵
鑳屾櫙鍑忔硶鍜屾椂闂村樊鍒嗘満鍒躲€?br />鑳屾櫙妯″瀷閫傚簲鏃堕棿鍙樺寲锛堟憞鎽?br />鏍戞湪锛屾稛婕按绛夛級鎻愬彇浜掕ˉ
鍦烘櫙涓殑鐗╀綋銆?/span>锛燂紵锛燂紵
B. Generic video sequences
Moving foreground object extraction from a given generic
video shot is one of the vital tasks for content representation
and retrieval in many computer vision applications. An
iterative method based on energy minimization has been
proposed for segmenting the primary moving object efficiently
from moving camera video sequences. Initial object
segmentation obtained using graph-cut is improved repeatedly
by the features extracted over a set of neighbouring frames
[21]. Thus, this iterative method can efficiently segment the
objects in video shots captured on a moving camera. A
conditional random field model based video object
segmentation system, capable of segmenting multiple moving
objects from complex background has been proposed [22]. In
this work, a complementary property of point and region
trajectories is utilized effectively by transferring the labels of
sparse point trajectories to region trajectories. Region
trajectories based on shape consistency provides robust design
to segment spatially overlapping region trajectories. As region
trajectories are extracted from hierarchical image over
segmentation, it segments meaningful regions over time.
time and computational complexity. Unsupervised
segmentation of moving camera video sequence using inter
frame change detection has been proposed [23].
閫氱敤瑙嗛搴忓垪
杩欓噷鎻愬埌浜嗕竴绉?ldquo;杩唬绠楁硶”锛屼粬鐨勫垵濮嬪寲灏辨槸閫氳繃涓€寮€濮嬬殑鍑犲抚鐨勫浘鐗囧垎鍓诧紝浠庝竴浜涚浉閭诲抚涓彁鍙栫殑鍏冪礌锛屾墍浠ヨ繖绉嶇畻娉曞彲浠ヤ粠“绉诲姩鐨勭浉鏈?rdquo;涓彁鍙栦俊鎭紵锛燂紵璁烘枃22涓彁璧蜂簡涓?br />
鍩轰簬鏉′欢闅忔満鍦烘ā鍨嬬殑瑙嗛瀵硅薄
鍒嗗壊绯荤粺锛岃兘澶熷垎鍓插涓Щ鍔?br />宸茬粡鎻愬嚭浜嗘潵鑷鏉傝儗鏅殑鐗╀綋锛堬紵锛燂紵璋锋瓕缈昏瘧缁撴灉锛?/span>
鍒嗗壊绯荤粺锛岃兘澶熷垎鍓插涓Щ鍔?br />宸茬粡鎻愬嚭浜嗘潵鑷鏉傝儗鏅殑鐗╀綋锛堬紵锛燂紵璋锋瓕缈昏瘧缁撴灉锛?/span>
璁烘枃22涓昏鎻愬強浜嗕粠绋€鐤忚建杩瑰埌绋犲瘑杞ㄨ抗鐨勭畻娉曪紵
璁烘枃23鎻愬強鐨勬槸鏃犵洃鐫e涔犳柟娉曪紵
以上是关于cvpr 2016 璁烘枃瀛︿範 Video object segmentation的主要内容,如果未能解决你的问题,请参考以下文章