cvpr 2016 璁烘枃瀛︿範 Video object segmentation

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了cvpr 2016 璁烘枃瀛︿範 Video object segmentation相关的知识,希望对你有一定的参考价值。

鏍囩锛?a href='http://www.mamicode.com/so/1/war' title='war'>war   閫氳繃   disco   move   content   ide   缂栫爜   mat   鑳屾櫙   

 

Abstract— Video object segmentation, a binary labelling
problem is vital in various applications including object tracking,
action recognition, video summarization, video editing, object
based encoding and video retrieval锛堟绱級. This paper presents an
overview of recent strategies in video object segmentation锛堝垎绫伙級,
focusing on the techniques for solving challenges like complex
and moving background, illumination锛堝厜绾?鐓ф槑锛?/strong> changes, occlusions锛堥伄鎸★級,
motion blur锛堣繍鍔ㄦā绯婏級, shadow effect and view point variation. Significant
works evolved in this research field over recent years are
categorized based on the challenges solved by the researchers. A
list of challenging datasets and evaluation metrics锛堟寚鏍囷級 available for
video object segmentation is presented. Finally, research gaps in
this domain锛堥鍩燂級 are discussed.
鎽樿 锛?瑙嗛鍒嗗壊绠楁硶锛岃繖鏄竴绉嶅湪鍚勭棰嗗煙涓紝姣斿瀵硅薄璺熻釜锛屽姩浣滆瘑鍒紝瑙嗛鍒嗙被锛岃棰戝壀杈戯紝鍩轰簬缂栫爜鍜岃棰戞绱㈢殑閲嶈鏂规硶銆?/div>
杩欑瘒璁烘枃涓昏鍛堢幇鐨勬槸鏈€杩戣棰戝垎鍓茬畻娉曚腑鐨勭瓥鐣ワ紝瀹冧富瑕佽В鍐充簡澶嶆潅鐨勭Щ鍔ㄨ儗鏅紝鐓ф槑鍙樺寲锛岄伄鎸$墿鍝侊紝杩愬姩妯$硦锛岄槾褰辨晥鏋滐紝鍜岃瑙掔殑鍙樺寲鐨勯棶棰樸€?/div>
鏈鏂囦腑娑夊強鍒扮殑閲嶈宸ヤ綔鍜屾寫鎴橀兘琚垎濂界被浜嗭紝鍚屾椂锛屼粬涔熸彁渚涗簡瑕佸涔犵殑鏁版嵁闆嗗拰璇勪环鎸囨爣锛屾渶鍚庯紝浠栦篃璁ㄨ浜嗚繖鍧楅鍩熺殑鍒嗘鐐广€?/div>
Recent internet world is engaged with massive amount of
video data thanks to the development in storage devices and
handy imaging systems. Huge terabyte锛堝厗瀛楄妭锛?/strong> of video are regularly

generated for various useful applications like surveillance锛堢洃鎺э級,

news broadcasting, telemedicine, etc. Based on the
information provided by CISCO on ‘Visual Networking Index
(VNI)’, the growth of internet video traffic will be three fold
from 2015 to 2020. Manually extracting semantic锛堣涔夛級 information
from this enormous amount of internet video is highly
unfeasible, seeking the need for automated methods to
annotate锛堟敞閲婏級/derive锛堝鍑猴級 useful information from the video data for
video management and retrieval [1]. Hence, one of the
essential steps for video processing and retrieval is video
object segmentation, a binary labelling problem for
differentiating the foreground objects accurately from the
background. Video object segmentation aims at partitioning锛堝垎绂伙級
every frame锛堝抚锛?/strong> in a video into meaningful objects by grouping
the pixels along spatio-temporal锛堟椂绌虹殑锛?/strong> direction that exhibit
coherency in appearance and motion [2]. Video object
segmentation task is highly challenging due to the following
reasons: (i) unknown number of objects in a video (ii) varying
background in a video and (iii) occurrence of multiple objects
in a video [3]. Existing approaches in video segmentation can
be broadly classified into two categories viz, interactive锛堜氦浜掑紡锛?/div>
method and unsupervised锛堟棤鐩戠潱鐨勶級 method. Interaction objects
segmentation method human intervention in initialization
process while unsupervised approaches can perform object
segmentation automatically. In Semi supervised approaches锛?
user intervention is required for annotating initial frames and
these annotations are transferred to the entire frames in the
video. Automated object segmentation approaches [7][8][9]
can segment any video data into meaningful objects without
user interaction based on object proposals and motion cues
from the video. The common assumption followed by most of
the automated methods is that only single object is moving
throughout the video and use only the motion information for
segmenting the object from the background. This assumption
will lead to poor segmentation under discontinuous motion of
object [11]. Referring the literature [12] [13] [14] [15] for
survey on video object segmentation which describes the
techniques available for image segmentation, not to video
data. In [15] authors classified the approaches in video
segmentation as inference and feature modes. The
segmentation techniques propose so far to improve the
segmentation results are grouped as inference modes and
methods that depend on features like depth, motion and
histogram are termed as feature modes. From this observation,
it is evident that none of the researchers have discussed the
segmentation approaches from the perspective of the
challenges solved by the algorithm. Hence this paper
categorizes the significant work contributed by researchers in
video object segmentation based on the issues resolved by the
respective authors. Several issues degrading the segmentation
performance are moving back ground, moving camera,
illumination variation, occlusion, shadow effect, viewpoint
variation, etc. Moreover the proposed algorithm should 
provide tradeoff between segmentation accuracy and
complexity. As depicted in fig. 1, this paper classifies the
video object segmentation task as:
1. Issue tackling mode
2. Complexity reduction mode and
3. Inference mode
The main contributions of this paper are:
x Summarizing the recent activities in video object
segmentation domain.
x Categorizing the significant works in this research
field meaningfully and
x Presenting a list of database and evaluation metrics
needed for developing an efficient video object
segmentation framework.
Organization of this paper: Section II describes the algorithms
contributed significantly in tackling the issues (discussed
earlier) involved in video object segmentation. Section III
presents an overview on segmentation approaches with
reduced complexity available in literature. Section IV provides
a gist on object segmentation techniques that fall under
inference mode. Section V lists the dataset and the evaluation
metrics used in these segmentation approaches and discusses
about research gaps in video object segmentation field.
Section IV concludes this study.
 
鏈€杩戯紝鎴戜滑鐨勭綉缁滀笘鐣屽厖鏂ョ潃鍚勭鍚勬牱鐨勮棰戜俊鎭€傘€傘€傘€傘€傛€讳箣鐢ㄤ簡涓€澶ф璇濆憡璇変綘杩欏緢閲嶈鍟︼紝鐒跺悗灏辨槸璇达紝鎴戜滑鐨勭洰鐨勶紙瑙嗛鍒嗗壊绠楁硶锛夋槸鍒嗙瑙嗛涓殑姣忎竴甯э紝浠庤€屽睍绀哄嚭瑙嗛涓殑鐗╀綋鐨勫姩闈欏拰璋愶紙浼拌鏄竴绉嶄竴鑷存€э級锛屽彲鏄紝瑙嗛鍒嗗壊鏈変互涓嬮毦鐐癸細
1.涓嶇煡閬撹棰戜腑鏈夊灏戠洰鏍囧璞?/strong>
2.澶氬彉鐨勮儗鏅?/strong>
3.澶氶噸鐩爣瀵硅薄
鐜板湪涓昏鏄袱绉嶆柟娉曪細浜や簰寮忥紝鏃犵洃鐫f柟娉曘€傚綋鐒舵垜浠繖绡囨枃绔犺偗瀹氭槸鏃犵洃鐫f柟娉曪紝鑷姩鍒嗗壊瑙嗛瀵硅薄銆?/div>
鑰屽湪鍗婄洃鐫e涔犳柟娉曚腑锛屽垵濮嬪寲涓€寮€濮嬬殑甯э紝鍜屾垜浠鍒嗗壊鐨勫璞¤偗瀹氭槸蹇呰鐨勶紝浣嗘槸鏃犵洃鐫f柟娉曚笉闇€瑕佽繖涓€鐐癸紝鐜板湪瀛樺湪鐨勫緢澶氳棰戝垎鍓茬畻娉曢兘鏄亣璁惧彧鏈夊崟鐙殑鐗╀綋瀵硅薄鍦ㄧЩ鍔紝浣嗘槸闈㈠涓嶈繛缁Щ鍔ㄧ殑瀵硅薄鏃跺€欙紝浼氬鑷翠笉鑹殑鏁堟灉銆傝€屽湪寮曟枃15涓殑浣滆€呰涓鸿棰戝垎鍓茬殑绠楁硶搴旇鏄熀浜?ldquo;鐗瑰緛鎻愬彇鍜屾帹鏂?/strong>”锛岀储寮?2-15鏄叧浜庡浘鍍忓垎鍓茬殑鏂规硶鎬荤粨銆傚熀浜庣洰鍓嶇殑瑙傚療锛屾樉鐒跺緢灏戞湁浜轰粠绠楁硶瑙掑害鎬荤粨浜嗚棰戝垎鍓茬殑鏂规硶锛屽洜姝よ繖绡囨枃绔犳€荤粨浜嗗湪涓€浜涘彲鑳戒細闄嶄綆瑙嗛鍒嗗壊鍑嗙‘搴︾殑涓€浜涢棶棰橈紝姣斿璇寸Щ鍔ㄧ殑鑳屾櫙锛岀Щ鍔ㄧ殑鐓х浉鏈猴紝鍏夌嚎鍙樺寲锛岄伄鎸$墿锛岄槾褰辨晥鏋滐紝瑙嗚鍙樺寲銆?/div>
鏇磋繘涓€姝ワ紝杩欑瘒鏂囩珷鎻愬嚭鐨勭畻娉曞皢浼氭潈琛$畻娉曠殑澶嶆潅搴﹀拰鍑嗙‘搴︿箣闂寸殑鑰冮噺銆?/div>
鎵€浠ユ湰绡囨枃绔犵殑鏋舵瀯鏄?
1.瑙e喅澶勭悊鏂瑰紡
2.澶嶆潅搴﹂檷浣?/div>
3.骞叉壈鐨勫垎鏋?/div>
鏈瘒鏂囩珷涓昏鐨勪笁涓础鐚細1.鎬荤粨浜嗘渶杩戣繖涓鍩熶腑鐨勫伐浣?2.灏嗙洰鍓嶆墍鐢ㄥ埌鐨勬柟娉曞垎绫?.鎻愪緵浜嗕竴浜涙暟鎹泦鍜屾暟鎹爣鍑嗕緵闃呰鑰呯粌涔犮€?/div>
II. ISSUE TACKLING MODE
This section details about ‘issue tackling mode’, first
category of the video object segmentation approach. Though
several issues (as discussed earlier) affect the performance of
the segmentation approaches, commonly occurring problems
are moving background, occlusion, shadow, rain , moving
camera, illumination and view point variation.
A. Surveillance video systems
The traffic surveillance systems include detection and
recognition of moving vehicles (objects) from traffic video
sequence. For any traffic surveillance system, vehicle
segmentation is the fundamental step and base for tracking the
vehicle movements. But, Vehicle segmentation in traffic
video is still challenging due to the moving objects and
illumination variations. To solve this issue, an unsupervised
neural network based background modelling has been
proposed for real time objects segmentation. In this work,
neural network serves as both adaptive model of the
background in a video sequence and a classifier of pixels as
background/foreground. The segmentation time taken by the
neural network is improved by implementing it in FPGA kit.
Though this neural network based background subtraction
method achieves good segmentation accuracy, it works well
only under slightly varying illumination and moving
background. A high cost is involved in reducing time
complexity [16]. Followed by this, [17]Appiah et. Al proposed
an integrated hardware implementation of moving object
segmentation in real time video stream under varying lighting
conditions. Two algorithms for multimodal background
modelling and connected component analysis is implemented
on a single chip FPGA. This method segments objects under
varying illumination condition at high processing speed. The
two algorithms described so far do not take raining issue into
account. Under raining situation, shadows and colour
reflections are the major problems to be tackled. A
conventional video object segmentation algorithm that
combines the background construction-based video object
segmentation and the foreground extraction-based video
object segmentation has been proposed. The foreground is
separated from the background using histogram-based change
detection technique and object regions are segmented
accurately by detecting the initial moving object masks based
on a frame difference mask. Shadow and colour reflection
regions are removed by diamond window mask and colour
analysis of moving object respectively. Segmentation of
moving objects are refined by morphological operations. The
segmentation results of moving objects under rainy situations.In the future, we will adaptively
obtain the threshold and adjustthe content of the video
automatically. Later, Chien et al [19] proposed a video object
segmentation and tracking technique for smart cameras in
visual surveillance networks. A multi-background model
based on threshold decision algorithm for video object
segmentation under drastic changes in illumination and
background clutter has been developed. In this method, the
threshold is selected robustly without user requirement and it
is different from per pixel background model which avoids
possible error propagations. Another algorithm for extracting
objects from videos captured by static camera has been
proposed to solve issues like waving tree, camouflage region
and sleeping is also proposed [20]. In this method, reference
background is obtained by averaging of some initial frames.
Temporal processing for object extraction do not consider
spatial correlation amongst the moving objects across frames.
Hence, an approximate motion field is derived using the
background subtraction and temporal difference mechanism.
The background model adapts temporal changes (swaying
trees, rippling water, etc) which extract the complementary
object in the scene.
using [18] is shown in fig.2.瀵逛簬浜ら€氭娴嬫潵璇达紝鏈€閲嶈鐨勫氨鏄垎绫诲悇绉嶅悇鏍风殑浜ら€氬伐鍏凤紝浣嗘槸锛屽洜涓虹墿浣撴€绘槸鍦ㄨ繍鍔ㄧ殑鍘熷洜锛屾墍浠ヨ繕鏄緢闅捐瘑鍒€傛墍浠ヤ负浜嗚В鍐宠繖涓棶棰橈紝涓€涓棤鐩戠潱鐨勭缁忕綉缁滆鎴戜滑鐢ㄦ潵浣滀负瑙嗛涓墠鏅壊鍜岃儗鏅壊鐨勯€傚簲鎬фā鍨嬪拰鍍忕礌鐨勫垎绫诲櫒銆傝繖涓缁忕綉缁滅殑杩愮畻鏃堕棿鍙互鎶婁粬瑁呭湪fgpa涓婃潵鍑忓皯锛岃櫧鐒惰繖涓缁忕綉缁滀綔涓?ldquo;绛涢櫎鑳屾櫙”鐨勬柟娉曞彇寰椾簡寰堥珮鐨勫垎绫绘晥鏋滐紝浣嗘槸鍙兘鍦ㄥ厜褰卞彉鍖栦笉澶у拰鑳屾櫙鍑犱箮涓嶅姩鐨勬儏鍐典笅浣跨敤锛屽悓鏃讹紝鍑忓皯鏃堕棿澶嶆潅搴︾殑鎴愭湰寰堥珮鐨勶紝鎵€浠ワ紝Appiah et. Al鎻愬嚭浜嗕竴绉嶅彲鍦ㄩ泦鎴愮‖浠朵笂瀹炶鐨勭畻娉曪紝杩欎袱绉嶇畻娉曞湪鍗曟牳fgpa涓婂氨鑳藉疄鐜帮紝鑰屼笖瀹冨緢濂藉湴瑙e喅浜嗗厜绾块棶棰樸€備絾鏄粬娌℃湁瑙e喅闆ㄥぉ鐨勯棶棰橈紝鍦ㄩ洦澶╋紝闃村奖鍜屽厜绾跨殑鍙嶅皠鏄渶涓昏鐨勯棶棰樸€備紶缁熺殑绠楁硶灏嗗熀浜庢灦鏋勭殑鑳屾櫙鍒嗙被鍜屽垎绂诲嚭鐨勫墠鏅墿浣撴贩鍚堝湪涓€璧枫€傝€屽墠鏅簲璇ュ埄鐢ㄥ熀浜?ldquo;鐩存柟鍥?rdquo;鏀瑰彉鐨勪睛娴嬫妧鏈紝鐩爣鍖哄煙涔熷簲璇ヨ鍒嗗壊鍑烘潵锛屾柟娉曟槸渚︽祴鏈€鍒濈殑绉诲姩鐗╀綋鍩轰簬绉诲姩鐗╀綋鐨勬帺妯″拰甯у樊寮傜殑鎺╃爜涓婏紙杩欐槸浠€涔堟剰鎬濓紝鐩墠娌℃悶鏄庣櫧锛?/span>銆傚弽姝d粬璇撮槾褰卞拰棰滆壊鍙嶅皠鐨勯儴鍒嗕細琚竴涓猟iamond window mask鍜岄鑹插垎鏋愮Щ鍔ㄧ殑鐩爣绠楁硶鍒嗗埆鏉ュ鐞嗐€傝繖鏄竴绉嶅垎褰㈠嚑浣曠殑绠楁硶锛岀Щ鍔ㄤ腑闇€瑕佸垎鍓茬殑鐗╀綋琚繖绉嶇畻娉曠粰闄愬埗浜嗭紝鍦╢ig2涓粨鏋滆鍛堢幇浜嗗嚭鏉ャ€傚湪鏈潵锛屾垜浠绠楁硶鑷姩閫傚簲鎬у湴璋冩暣“闃堝€?rdquo;鍜?ldquo;璋冩暣鐨勫唴瀹?rdquo;銆備箣鍚庯紝chien鎻愬嚭浜嗕竴绉嶅灏忓瀷鐓х浉鏈虹殑瑙嗚绁炵粡缃戠粶鐨勭畻娉曪紝鍦ㄨ繖绉嶆柟娉曚腑锛?ldquo;闃堝€?rdquo;涓嶉渶瑕佷娇鐢ㄨ€呯殑甯姪灏辫兘椴佹鍦扮粰鍑猴紝鑰屼笖瀹冧笌閫愬儚绱犵殑绠楁硶杩樹笉鍚岋紝閬垮厤浜嗗彲鑳界殑閿欒瀹d紶锛燂紵锛堝暐鎰忔€濓紝涓嶆噦锛夈€傝繕鏈変竴绉嶇畻娉曟槸浣跨敤闈欐€佺浉鏈虹殑锛屼笓闂ㄧ敤鏉ユ崟鎹夋鍦ㄦ憞鏅冪殑鏍戯紝杩樻湁涓€浜涗吉瑁呯殑涓滆タ銆傚垵濮嬪寲鏃跺埄鐢ㄤ竴浜涘垵濮嬪抚鐨勫钩鍧囧€硷紙浠€涔堟剰鎬濓紵锛燂級锛屼絾鏄繖绉嶇畻娉曟病鏈夎€冭檻绌洪棿鐩稿叧鎬э紝灏ゅ叾鏄偅浜涢€愬抚绉诲姩鐨勭墿浣擄紵锛熴€?/span>
 
涓汉缁撳悎杩欑绠楁硶鐨勯偅寮犳晥鏋滃浘锛岃寰楀氨鏄彲浠ヨ繃婊ゆ帀鍏夊奖鏁堟灉锛屼粎鐣欏瓨鐪熸鐨勭洰鏍囥€?/span>
 
鏈€鍚庝竴娈佃瘽鐪熺殑鐪嬩笉鎳備簡锛屾墍浠ュ氨鐩存帴璋锋瓕缈昏瘧浜嗭紵锛燂紵锛?/span>
 
鍥犳锛屼娇鐢ㄨ鎺ㄥ鍑鸿繎浼艰繍鍔ㄥ満
鑳屾櫙鍑忔硶鍜屾椂闂村樊鍒嗘満鍒躲€?br />鑳屾櫙妯″瀷閫傚簲鏃堕棿鍙樺寲锛堟憞鎽?br />鏍戞湪锛屾稛婕按绛夛級鎻愬彇浜掕ˉ
鍦烘櫙涓殑鐗╀綋銆?/span>
锛燂紵锛燂紵
 
B. Generic video sequences
Moving foreground object extraction from a given generic
video shot is one of the vital tasks for content representation
and retrieval in many computer vision applications. An
iterative method based on energy minimization has been
proposed for segmenting the primary moving object efficiently
from moving camera video sequences. Initial object
segmentation obtained using graph-cut is improved repeatedly
by the features extracted over a set of neighbouring frames
[21]. Thus, this iterative method can efficiently segment the
objects in video shots captured on a moving camera. A
conditional random field model based video object
segmentation system, capable of segmenting multiple moving
objects from complex background has been proposed [22]. In
this work, a complementary property of point and region
trajectories is utilized effectively by transferring the labels of
sparse point trajectories to region trajectories. Region
trajectories based on shape consistency provides robust design
to segment spatially overlapping region trajectories. As region
trajectories are extracted from hierarchical image over
segmentation, it segments meaningful regions over time.
time and computational complexity. Unsupervised
segmentation of moving camera video sequence using inter
frame change detection has been proposed [23].
閫氱敤瑙嗛搴忓垪
杩欓噷鎻愬埌浜嗕竴绉?ldquo;杩唬绠楁硶”锛屼粬鐨勫垵濮嬪寲灏辨槸閫氳繃涓€寮€濮嬬殑鍑犲抚鐨勫浘鐗囧垎鍓诧紝浠庝竴浜涚浉閭诲抚涓彁鍙栫殑鍏冪礌锛屾墍浠ヨ繖绉嶇畻娉曞彲浠ヤ粠“绉诲姩鐨勭浉鏈?rdquo;涓彁鍙栦俊鎭紵锛燂紵璁烘枃22涓彁璧蜂簡涓?br />
鍩轰簬鏉′欢闅忔満鍦烘ā鍨嬬殑瑙嗛瀵硅薄
鍒嗗壊绯荤粺锛岃兘澶熷垎鍓插涓Щ鍔?br />宸茬粡鎻愬嚭浜嗘潵鑷鏉傝儗鏅殑鐗╀綋锛堬紵锛燂紵璋锋瓕缈昏瘧缁撴灉锛?/span>
璁烘枃22涓昏鎻愬強浜嗕粠绋€鐤忚建杩瑰埌绋犲瘑杞ㄨ抗鐨勭畻娉曪紵
璁烘枃23鎻愬強鐨勬槸鏃犵洃鐫e涔犳柟娉曪紵
 
 
 
 
 
 
 

以上是关于cvpr 2016 璁烘枃瀛︿範 Video object segmentation的主要内容,如果未能解决你的问题,请参考以下文章

瀛︽福涓婃墜 LaTeX 瀹屾垚姣曚笟璁烘枃

CVPR2020论文解析:视频分类Video Classification

鍛ㄦ€荤粨