[CVPR2020]论文翻译SwapText: Image Based Texts Transfer in Scenes
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[CVPR2020]论文翻译SwapText: Image Based Texts Transfer in Scenes相关的知识,希望对你有一定的参考价值。
参考技术A
由于不同因素之间的复杂作用,在保留原始字体,颜色,大小和背景纹理的同时在场景图像中交换文本是一项具有挑战性的任务。在这项工作中,我们提出了一个三阶段框架SwapText,用于跨场景图像传输文本。 首先,提出了一种新颖的文本交换网络来仅替换前景图像中的文本标签。 其次,背景完成网络来学习以重建背景图像。 最后,通过融合网络将生成的前景图像和背景图像用于生成文字图像。 使用提出的框架,即使出现严重的几何失真,我们也可以巧妙的处理输入图像的文本。 定性和定量结果显示在几个场景文本数据集上,包括规则和不规则文本数据集。 我们进行了广泛的实验以证明我们的方法的有效性,例如基于图像的文本翻译,文本图像合成等。
想象一下,能够在场景图像中交换文本,同时在几秒钟内保持原始字体,颜色,大小和背景纹理,而无需花费数小时进行图像编辑。 在这项工作中,我们旨在通过自动替换场景图像中文本的算法来实现此目标。文本交换的核心挑战在于生成视觉逼真的文本并与原始文本保持一致的样式。
文本交换或文本替换在许多情况下都涉及到,包括文本检测,文本识别,海报中的文本转换和其他创造性应用。 对于文本检测和识别任务,文本交换是一种非常有用的数据增强方法。 见证了深度神经网络(DNN)在各种计算机视觉任务中的巨大成功,获得大量带注释的训练图像已成为训练DNN模型的瓶颈。最简单,使用最广泛的方法是通过几何变换来增加训练图像,例如平移,旋转和翻转等。近来,已经提出了基于图像合成的方法[11、7、39]来训练文本检测和识别模型。这些方法通过结合不同的渲染技术对光和能量的物理行为进行建模来从无文本图像中创建新图像。但是, 合成图像无法与场景中的图像完全融合,这在将合成图像应用于DNN模型训练时至关重要。
近年来,许多图像生成模型,例如生成对抗网络(GAN)[6],可变自动编码器(VAE)[17]和自回归模型[25],为现实的图像生成任务提供了强大的工具。在[9,38,33]中,GAN用于图像补全,可为缺失区域生成视觉上逼真的和语义上合理的像素。 [21,8,28,22]已经利用这些网络生成具有不同姿势或服装的新颖人物图像。
我们的贡献总结如下:
文本图像合成
图像合成已在计算机图形学研究中得到了广泛的研究[4]。文本图像合成被研究为一种数据增强方法,用于训练准确而健壮的DNN模型。例如,Jaderberg等[11]使用单词生成器来生成用于文本识别任务的合成单词图像。Gupta等 [7]开发了一个健壮的引擎来生成用于文本检测和识别任务的合成文本图像。 文本图像合成的目标是将文本插入背景图像中语义上敏感的区域。许多因素都影响合成文本图像的真实相似度,例如文本大小,文本视角,环境光照等。 在[39]中,Zhanet等人通过结合语义连贯,视觉注意力和自适应文本外观这三种设计来实现文本文本图像合成。尽管文本图像合成在视觉上是逼真的,但合成图像与真实图像之间仍存在许多差异。例如, 与真实图像相比,合成图像中文本字体和背景图像非常有限。
在最近,基于GAN的图像合成技术得到了进一步的探索。在[41]中,Zhan等人提出了一种将几何合成器和外观合成器组合在一起的空间融合GAN,以在几何和外观空间中实现合成现实。Yang等人[36]使用双向形状匹配框架通过可调整的参数来控制字形的关键风格。 GA-DAN [40]提出了一项有趣的工作,能够同时在几何空间和外观空间中对跨域移位进行建模。[2]中提出了MC-GAN来实现从A到Z的字母集的字体样式转换。 Wu等人 [34]提出了一个端到端的可训练样式保留网络来编辑自然图像中的文本。
图像生成
随着生成模型(例如GAN [6],VAE [17]和自动回归模型[25])的巨大成功,逼真而清晰的图像生成最近吸引了越来越多的关注。传统的生成模型使用GAN [6]或VAE [17]来将噪声z生成的分布映射到实际数据的分布。例如,GANs [6]用于生成真实面孔[37、3、15]和鸟类[29]。
为了控制所生成的结果,Mirzaet等人[23]提出了有条件的GAN。它们会生成在类别标签上进行分类的MNIST数字。在[12]中,karacanet等。根据语义布局和场景属性(例如日夜,晴天雾天)生成逼真的室外场景图像。 Lassneretal [19]基于细粒度的身体和衣服片段生成了穿着者的全身图像。完整模型可以以姿势,形状或颜色为条件。Ma[21,22]基于图像和姿势生成人图像。在[18]中提出了快速人脸交换,以将输入身份转换为目标身份,同时保留姿势,面部表情和光照。
图像完成
最近,基于GAN的方法已经成为图像完成的一种有希望的范例。 Iizuka等 [9]提议使用全局和局部判别器作为对抗性损失,在其中全局和本地一致性都得到了加强。Yu等人 [38]使用上下文注意力层来显式地参与远距离空间位置上的相关特征补丁。 Wang等 [33]使用多列网络以并行方式生成不同的图像分量,并采用隐式的多样化MRF正则化来增强局部细节。
给定场景文本图像Is,我们的目标是在保持原始样式的基础上基于内容图像Ic替换文本。 如图2所示,我们的框架由文本交换网络,背景完成网络和融合网络组成。文本交换网络首先从Is中提取样式特征从Ic中提取内容特征,然后通过自注意网络合并这两个特征。 为了更好地表示内容,我们使用内容形状转换网络(CSTN)根据样式图像Is的几何属性来转换内容图像Ic。背景完成网络用于重建样式图像Is的原始背景图像Ib。 最后,文本交换网络和背景完成网络的输出被融合网络融合以生成最终的文本图像。
现实情况下的文本实例具有多种形状,例如,呈水平,定向或弯曲形式。 文本交换网络的主要目的是在保留原始样式(尤其是文本形状)的同时替换样式图像Is的内容。 为了提高不规则文本图像生成的性能,我们提出了一个内容形状转换网络(CSTN)将内容图像映射到样式图像的相同几何形状中,然后通过3个下采样卷积层和几个残差块对样式图像和转换后的内容图像进行编码。 为了充分融合样式和内容特征,我们将它们馈入了一个自注意网络。 对于解码,使用3个上采样反卷积层来生成前景图像If。
文本形状的定义对于内容形状的转换至关重要。 受文本检测[20]和文本识别[35]领域中的文本形状定义的启发,可以使用2 K个基准点P = p1,p2,...,p2K定义文本的几何尺寸属性,如图3所示。
在对内容和样式图像进行编码之后,我们将两个特征图都馈送到自注意网络,该网络会自动学习内容特征图Fc和样式特征图Fs之间的对应关系。 输出特征图是Fcs,图5(a)给出了自注意力的网络结构。
内容特征Fc和样式特征Fs首先沿其深度轴连接。 然后,我们遵循[42]中类似的自注意力机制来生成输出特征图Fcs。
除了这种单级样式化之外,我们还开发了多级样式化管道,如图5(b)所示。 我们将自注意力网络依次应用于多个特征图层,以生成更逼真的图像。
文本交换网络主要侧重于前景图像生成,而背景图像在最终图像生成中也起着重要作用。为了生成更逼真的文字图像,我们使用背景完成网络来重建背景图像,其结构如表1所示。大多数现有的图像完成方法都是通过借用或复制周围区域的纹理来填充图像的像素。一般的结构遵循编码器-解码器结构,我们在编码器之后使用膨胀卷积层来计算具有较大输入区域的输出像素。通过使用较低分辨率的膨胀卷积,模型可以有效地“看到”输入图像的较大区域。
在此阶段,将文本交换网络和背景完成网络的输出融合以生成完整的文本图像。 如图2所示,融合网络遵循编码器-解码器结构。 类似于[34],我们在融合解码器的上采样阶段将背景完成网络的解码特征图连接到具有相同分辨率的相应特征图。 我们使用Gfuse和Dfuse分别表示生成器和判别器网络。 融合网络的损失函数可计算如下:
为了制作更逼真的图像,我们还遵循样式迁移网络[5,26]的类似思想,将VGG-loss引入融合模块。 VGG损失分为两部分,即知觉损失和风格损失,如下所示:
我们遵循[34]中的类似思想来生成具有相同样式的成对合成图像。我们使用超过1500个字体和10000个背景图像来生成总共100万个狮子训练图像和10000个测试图像。输入图像的大小调整为64×256,批处理大小为32。从权重为零的正态分布初始化所有权重,标准差为0.01。使用β1= 0.9和β2= 0.999的Adam优化器[16]来优化整个框架。在训练阶段将学习率设置为0.0001。我们在Ten-sorFlow框架[1]下实现我们的模型。我们的方法中的大多数模块都是GPU加速的。
我们在几个公共基准数据集上评估了我们提出的方法。
我们采用图像生成中常用的指标来评估我们的方法,其中包括:
在本节中,我们将通过经验研究不同的模型设置如何影响我们提出的框架的性能。我们的研究主要集中在以下方面:内容形状转换网络,自注意力网络和背景完成网络中的膨胀卷积。图6给出了一些定性结果。
自注意力网络
使用自注意力网络来充分结合内容特征和风格特征。根据表2,使用单层自注意力网络,平均l2误差减少约0.003,平均PSNR增加约0.3,平均SSIM增加约0.012。为了使用样式和内容特征的更多全局统计信息,我们采用了一个多层的自注意力网络来融合全局和局部模式。借助多级自我关注网络,所有的度量方法都得到了改进。
膨胀卷积
膨胀卷积层可以扩大像素区域以重建背景图像,因此更容易生成更高质量的图像。 根据表2,具有膨胀卷积层的背景完成网络在所有指标上均具有更好的性能。
为了评估我们提出的方法,我们将其与两种文本交换方法进行了比较:[10]中提出的pix2pix和Wuet等人[34]提出的SRNet。 我们使用生成的数据集来训练和测试这两个模型。根据论文,两种方法都保持相同的配置。
定量结果
在表2中,我们给出了本方法和其他两种竞争方法的定量结果。显然,我们提出的方法在不同语言的所有指标上都有显著改进,平均l2误差减少了0.009以上,平均PSNR增加了0.9以上,平均SSIM增加了0.04以上。第二个最好的方法。
基于图像的翻译是任意文本样式传输的最重要应用之一。在本节中,我们提供一些基于图像的翻译示例,如图7所示。我们在英语和中文之间进行翻译。从结果可以看出,无论目标语言是中文还是英文,都可以很好地保持颜色,几何变形和背景纹理,并且字符的结构与输入文本相同。
在图9中,我们还展示了在场景文本数据集上评估的模型的一些示例结果。根据图9, 我们的模型可以替换输入图像中的文本,同时保留原始字体,颜色,大小和背景纹理。
我们的方法有以下局限性。由于训练数据量有限,因此无法充分利用几何属性空间和字体空间。当样式图像中的文本出现波动时,我们提出的方法将失败,请参见图8(顶部)。图8(底部)显示了使用WordArt中的样式图像的失败案例。
在这项研究中,我们提出了一种健壮的场景文本交换框架SwapText,以解决用预期的文本替换场景文本图像中的文本的新任务。我们采用分而治之的策略,将问题分解为三个子网络,即文本交换网络,背景完成网络和融合网络。在文本交换网络中,内容图像和样式图像的特征被同时提取,然后通过自注意网络进行组合。为了更好地学习内容图像的表示,我们使用内容形状转换网络(CSTN)根据样式图像的几何属性对内容图像进行转换。然后,使用背景完成网络来生成内容图像的背景图像样式图片。最后,将文本交换网络和背景完成网络的输出馈送到融合网络中,以生成更真实和语义一致的图像。在几个公共场景文本数据集上的定性和定量结果证明了我们方法的优越性。在未来的工作中,我们将探索基于字体和颜色生成更多可控制的文本图像。
CVPR2020 论文和代码合集
CVPR2020-Code
CVPR 2020 论文开源项目合集,同时欢迎各位大佬提交issue,分享CVPR 2020开源项目
【推荐阅读】
-
ECCV 2020 论文开源项目合集来了:https://github.com/amusi/ECCV2020-Code
-
关于往年CV顶会论文(如ECCV 2020、CVPR 2019、ICCV 2019)以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
【CVPR 2020 论文开源目录】
- CNN
- 图像分类
- 视频分类
- 目标检测
- 3D目标检测
- 视频目标检测
- 目标跟踪
- 语义分割
- 实例分割
- 全景分割
- 视频目标分割
- 超像素分割
- 交互式图像分割
- NAS
- GAN
- Re-ID
- 3D点云(分类/分割/配准/跟踪等)
- 人脸(识别/检测/重建等)
- 人体姿态估计(2D/3D)
- 人体解析
- 场景文本检测
- 场景文本识别
- 特征(点)检测和描述
- 超分辨率
- 模型压缩/剪枝
- 视频理解/行为识别
- 人群计数
- 深度估计
- 6D目标姿态估计
- 手势估计
- 显著性检测
- 去噪
- 去雨
- 去模糊
- 去雾
- 特征点检测与描述
- 视觉问答(VQA)
- 视频问答(VideoQA)
- 视觉语言导航
- 视频压缩
- 视频插帧
- 风格迁移
- 车道线检测
- "人-物"交互(HOI)检测
- 轨迹预测
- 运动预测
- 光流估计
- 图像检索
- 虚拟试衣
- HDR
- 对抗样本
- 三维重建
- 深度补全
- 语义场景补全
- 图像/视频描述
- 线框解析
- 数据集
- 其他
- 不确定中没中
CNN
Exploring Self-attention for Image Recognition
-
论文:https://hszhao.github.io/papers/cvpr20_san.pdf
-
代码:https://github.com/hszhao/SAN
Improving Convolutional Networks with Self-Calibrated Convolutions
-
主页:https://mmcheng.net/scconv/
-
论文:http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf
-
代码:https://github.com/backseason/SCNet
Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets
- 论文:https://arxiv.org/abs/2003.13549
- 代码:https://github.com/zeiss-microscopy/BSConv
图像分类
Interpretable and Accurate Fine-grained Recognition via Region Grouping
-
论文:https://arxiv.org/abs/2005.10411
-
代码:https://github.com/zxhuang1698/interpretability-by-parts
Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion
-
论文:https://arxiv.org/abs/2003.04490
-
代码:https://github.com/AdamKortylewski/CompositionalNets
Spatially Attentive Output Layer for Image Classification
- 论文:https://arxiv.org/abs/2004.07570
- 代码(好像被原作者删除了):https://github.com/ildoonet/spatially-attentive-output-layer
视频分类
SmallBigNet: Integrating Core and Contextual Views for Video Classification
- 论文:https://arxiv.org/abs/2006.14582
- 代码:https://github.com/xhl-video/SmallBigNet
目标检测
Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Overcoming_Classifier_Imbalance_for_Long-Tail_Object_Detection_With_Balanced_Group_CVPR_2020_paper.pdf
- 代码:https://github.com/FishYuLi/BalancedGroupSoftmax
AugFPN: Improving Multi-scale Feature Learning for Object Detection
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Guo_AugFPN_Improving_Multi-Scale_Feature_Learning_for_Object_Detection_CVPR_2020_paper.pdf
- 代码:https://github.com/Gus-Guo/AugFPN
Noise-Aware Fully Webly Supervised Object Detection
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Shen_Noise-Aware_Fully_Webly_Supervised_Object_Detection_CVPR_2020_paper.html
- 代码:https://github.com/shenyunhang/NA-fWebSOD/
Learning a Unified Sample Weighting Network for Object Detection
- 论文:https://arxiv.org/abs/2006.06568
- 代码:https://github.com/caiqi/sample-weighting-network
D2Det: Towards High Quality Object Detection and Instance Segmentation
-
论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf
-
代码:https://github.com/JialeCao001/D2Det
Dynamic Refinement Network for Oriented and Densely Packed Object Detection
-
论文下载链接:https://arxiv.org/abs/2005.09973
-
代码和数据集:https://github.com/Anymake/DRN_CVPR2020
Scale-Equalizing Pyramid Convolution for Object Detection
论文:https://arxiv.org/abs/2005.03101
代码:https://github.com/jshilong/SEPC
Revisiting the Sibling Head in Object Detector
-
论文:https://arxiv.org/abs/2003.07540
-
代码:https://github.com/Sense-X/TSD
Scale-equalizing Pyramid Convolution for Object Detection
- 论文:暂无
- 代码:https://github.com/jshilong/SEPC
Detection in Crowded Scenes: One Proposal, Multiple Predictions
- 论文:https://arxiv.org/abs/2003.09163
- 代码:https://github.com/megvii-model/CrowdDetection
Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection
- 论文:https://arxiv.org/abs/2004.04725
- 代码:https://github.com/NVlabs/wetectron
Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection
- 论文:https://arxiv.org/abs/1912.02424
- 代码:https://github.com/sfzhang15/ATSS
BiDet: An Efficient Binarized Object Detector
- 论文:https://arxiv.org/abs/2003.03961
- 代码:https://github.com/ZiweiWangTHU/BiDet
Harmonizing Transferability and Discriminability for Adapting Object Detectors
- 论文:https://arxiv.org/abs/2003.06297
- 代码:https://github.com/chaoqichen/HTCN
CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection
- 论文:https://arxiv.org/abs/2003.09119
- 代码:https://github.com/KiveeDong/CentripetalNet
Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection
- 论文:https://arxiv.org/abs/2003.11818
- 代码:https://github.com/ggjy/HitDet.pytorch
EfficientDet: Scalable and Efficient Object Detection
- 论文:https://arxiv.org/abs/1911.09070
- 代码:https://github.com/google/automl/tree/master/efficientdet
3D目标检测
SESS: Self-Ensembling Semi-Supervised 3D Object Detection
-
论文: https://arxiv.org/abs/1912.11803
-
代码:https://github.com/Na-Z/sess
Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection
-
论文: https://arxiv.org/abs/2006.04356
-
代码:https://github.com/dleam/Associate-3Ddet
What You See is What You Get: Exploiting Visibility for 3D Object Detection
-
主页:https://www.cs.cmu.edu/~peiyunh/wysiwyg/
-
论文:https://arxiv.org/abs/1912.04986
-
代码:https://github.com/peiyunh/wysiwyg
Learning Depth-Guided Convolutions for Monocular 3D Object Detection
- 论文:https://arxiv.org/abs/1912.04799
- 代码:https://github.com/dingmyu/D4LCN
Structure Aware Single-stage 3D Object Detection from Point Cloud
-
论文:http://openaccess.thecvf.com/content_CVPR_2020/html/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.html
-
代码:https://github.com/skyhehe123/SA-SSD
IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving
-
论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Peng_IDA-3D_Instance-Depth-Aware_3D_Object_Detection_From_Stereo_Vision_for_Autonomous_CVPR_2020_paper.pdf
-
代码:https://github.com/swords123/IDA-3D
Train in Germany, Test in The USA: Making 3D Object Detectors Generalize
-
论文:https://arxiv.org/abs/2005.08139
-
代码:https://github.com/cxy1997/3D_adapt_auto_driving
MLCVNet: Multi-Level Context VoteNet for 3D Object Detection
- 论文:https://arxiv.org/abs/2004.05679
- 代码:https://github.com/NUAAXQ/MLCVNet
3DSSD: Point-based 3D Single Stage Object Detector
-
CVPR 2020 Oral
-
论文:https://arxiv.org/abs/2002.10187
-
代码:https://github.com/tomztyang/3DSSD
Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation
-
论文:https://arxiv.org/abs/2004.03572
-
代码:https://github.com/zju3dv/disprcn
End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection
-
论文:https://arxiv.org/abs/2004.03080
-
代码:https://github.com/mileyan/pseudo-LiDAR_e2e
DSGN: Deep Stereo Geometry Network for 3D Object Detection
- 论文:https://arxiv.org/abs/2001.03398
- 代码:https://github.com/chenyilun95/DSGN
LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention
- 论文:https://arxiv.org/abs/2004.01389
- 代码:https://github.com/yinjunbo/3DVID
PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
-
论文:https://arxiv.org/abs/1912.13192
-
代码:https://github.com/sshaoshuai/PV-RCNN
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
- 论文:https://arxiv.org/abs/2003.01251
- 代码:https://github.com/WeijingShi/Point-GNN
视频目标检测
Memory Enhanced Global-Local Aggregation for Video Object Detection
论文:https://arxiv.org/abs/2003.12063
代码:https://github.com/Scalsol/mega.pytorch
目标跟踪
SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking
- 论文:https://arxiv.org/abs/1911.07241
- 代码:https://github.com/ohhhyeahhh/SiamCAR
D3S – A Discriminative Single Shot Segmentation Tracker
- 论文:https://arxiv.org/abs/1911.08862
- 代码:https://github.com/alanlukezic/d3s
ROAM: Recurrently Optimizing Tracking Model
-
论文:https://arxiv.org/abs/1907.12006
-
代码:https://github.com/skyoung/ROAM
Siam R-CNN: Visual Tracking by Re-Detection
- 主页:https://www.vision.rwth-aachen.de/page/siamrcnn
- 论文:https://arxiv.org/abs/1911.12836
- 论文2:https://www.vision.rwth-aachen.de/media/papers/192/siamrcnn.pdf
- 代码:https://github.com/VisualComputingInstitute/SiamR-CNN
Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises
- 论文:https://arxiv.org/abs/2003.09595
- 代码:https://github.com/MasterBin-IIAU/CSA
High-Performance Long-Term Tracking with Meta-Updater
-
论文:https://arxiv.org/abs/2004.00305
-
代码:https://github.com/Daikenan/LTMU
AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization
-
论文:https://arxiv.org/abs/2003.12949
-
代码:https://github.com/vision4robotics/AutoTrack
Probabilistic Regression for Visual Tracking
- 论文:https://arxiv.org/abs/2003.12565
- 代码:https://github.com/visionml/pytracking
MAST: A Memory-Augmented Self-supervised Tracker
- 论文:https://arxiv.org/abs/2002.07793
- 代码:https://github.com/zlai0/MAST
Siamese Box Adaptive Network for Visual Tracking
- 论文:https://arxiv.org/abs/2003.06761
- 代码:https://github.com/hqucv/siamban
多目标跟踪
3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset
- 主页:https://vap.aau.dk/3d-zef/
- 论文:https://arxiv.org/abs/2006.08466
- 代码:https://bitbucket.org/aauvap/3d-zef/src/master/
- 数据集:https://motchallenge.net/data/3D-ZeF20
语义分割
FDA: Fourier Domain Adaptation for Semantic Segmentation
-
论文:https://arxiv.org/abs/2004.05498
-
代码:https://github.com/YanchaoYang/FDA
Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation
-
论文:暂无
-
代码:https://github.com/JianqiangWan/Super-BPD
Single-Stage Semantic Segmentation from Image Labels
-
论文:https://arxiv.org/abs/2005.08104
-
代码:https://github.com/visinf/1-stage-wseg
Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation
- 论文:https://arxiv.org/abs/2003.00867
- 代码:https://github.com/MyeongJin-Kim/Learning-Texture-Invariant-Representation
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation
- 论文:http://vladlen.info/papers/MSeg.pdf
- 代码:https://github.com/mseg-dataset/mseg-api
CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement
- 论文:https://arxiv.org/abs/2005.02551
- 代码:https://github.com/hkchengrex/CascadePSP
Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision
- Oral
- 论文:https://arxiv.org/abs/2004.07703
- 代码:https://github.com/feipan664/IntraDA
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation
- 论文:https://arxiv.org/abs/2004.04581
- 代码:https://github.com/YudeWang/SEAM
Temporally Distributed Networks for Fast Video Segmentation
-
论文:https://arxiv.org/abs/2004.01800
-
代码:https://github.com/feinanshan/TDNet
Context Prior for Scene Segmentation
-
论文:https://arxiv.org/abs/2004.01547
-
代码:https://git.io/ContextPrior
Strip Pooling: Rethinking Spatial Pooling for Scene Parsing
-
论文:https://arxiv.org/abs/2003.13328
-
代码:https://github.com/Andrew-Qibin/SPNet
Cars Can’t Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks
- 论文:https://arxiv.org/abs/2003.05128
- 代码:https://github.com/shachoi/HANet
Learning Dynamic Routing for Semantic Segmentation
-
论文:https://arxiv.org/abs/2003.10401
-
代码:https://github.com/yanwei-li/DynamicRouting
实例分割
D2Det: Towards High Quality Object Detection and Instance Segmentation
-
论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf
-
代码:https://github.com/JialeCao001/D2Det
PolarMask: Single Shot Instance Segmentation with Polar Representation
- 论文:https://arxiv.org/abs/1909.13226
- 代码:https://github.com/xieenze/PolarMask
- 解读:https://zhuanlan.zhihu.com/p/84890413
CenterMask : Real-Time Anchor-Free Instance Segmentation
- 论文:https://arxiv.org/abs/1911.06667
- 代码:https://github.com/youngwanLEE/CenterMask
BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation
- 论文:https://arxiv.org/abs/2001.00309
- 代码:https://github.com/aim-uofa/AdelaiDet
Deep Snake for Real-Time Instance Segmentation
- 论文:https://arxiv.org/abs/2001.01629
- 代码:https://github.com/zju3dv/snake
Mask Encoding for Single Shot Instance Segmentation
-
论文:https://arxiv.org/abs/2003.11712
-
代码:https://github.com/aim-uofa/AdelaiDet
全景分割
Video Panoptic Segmentation
- 论文:https://arxiv.org/abs/2006.11339
- 代码:https://github.com/mcahny/vps
- 数据集:https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0
Pixel Consensus Voting for Panoptic Segmentation
- 论文:https://arxiv.org/abs/2004.01849
- 代码:还未公布
BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation
论文:https://arxiv.org/abs/2003.14031
代码:https://github.com/Mooonside/BANet
视频目标分割
A Transductive Approach for Video Object Segmentation
-
论文:https://arxiv.org/abs/2004.07193
-
代码:https://github.com/microsoft/transductive-vos.pytorch
State-Aware Tracker for Real-Time Video Object Segmentation
-
论文:https://arxiv.org/abs/2003.00482
-
代码:https://github.com/MegviiDetection/video_analyst
Learning Fast and Robust Target Models for Video Object Segmentation
- 论文:https://arxiv.org/abs/2003.00908
- 代码:https://github.com/andr345/frtm-vos
Learning Video Object Segmentation from Unlabeled Videos
- 论文:https://arxiv.org/abs/2003.05020
- 代码:https://github.com/carrierlxk/MuG
超像素分割
Superpixel Segmentation with Fully Convolutional Networks
- 论文:https://arxiv.org/abs/2003.12929
- 代码:https://github.com/fuy34/superpixel_fcn
交互式图像分割
Interactive Object Segmentation with Inside-Outside Guidance
- 论文下载链接:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf
- 代码:https://github.com/shiyinzhang/Inside-Outside-Guidance
- 数据集:https://github.com/shiyinzhang/Pixel-ImageNet
NAS
AOWS: Adaptive and optimal network width search with latency constraints
- 论文:https://arxiv.org/abs/2005.10481
- 代码:https://github.com/bermanmaxim/AOWS
Densely Connected Search Space for More Flexible Neural Architecture Search
-
论文:https://arxiv.org/abs/1906.09607
-
代码:https://github.com/JaminFong/DenseNAS
MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning
-
论文:https://arxiv.org/abs/2003.14058
-
代码:https://github.com/bhpfelix/MTLNAS
FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions
-
论文下载链接:https://arxiv.org/abs/2004.05565
-
代码:https://github.com/facebookresearch/mobile-vision
Neural Architecture Search for Lightweight Non-Local Networks
- 论文:https://arxiv.org/abs/2004.01961
- 代码:https://github.com/LiYingwei/AutoNL
Rethinking Performance Estimation in Neural Architecture Search
- 论文:https://arxiv.org/abs/2005.09917
- 代码:https://github.com/zhengxiawu/rethinking_performance_estimation_in_NAS
- 解读1:https://www.zhihu.com/question/372070853/answer/1035234510
- 解读2:https://zhuanlan.zhihu.com/p/111167409
CARS: Continuous Evolution for Efficient Neural Architecture Search
- 论文:https://arxiv.org/abs/1909.04977
- 代码(即将开源):https://github.com/huawei-noah/CARS
GAN
SEAN: Image Synthesis with Semantic Region-Adaptive Normalization
- 论文:https://arxiv.org/abs/1911.12861
- 代码:https://github.com/ZPdesu/SEAN
Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation
- 论文地址:http://openaccess.thecvf.com/content_CVPR_2020/html/Chen_Reusing_Discriminators_for_Encoding_Towards_Unsupervised_Image-to-Image_Translation_CVPR_2020_paper.html
- 代码地址:https://github.com/alpc91/NICE-GAN-pytorch
Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning
- 论文:https://arxiv.org/abs/1912.01899
- 代码:https://github.com/SsGood/DBGAN
PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer
- 论文:https://arxiv.org/abs/1909.06956
- 代码:https://github.com/wtjiang98/PSGAN
Semantically Mutil-modal Image Synthesis
- 主页:http://seanseattle.github.io/SMIS
- 论文:https://arxiv.org/abs/2003.12697
- 代码:https://github.com/Seanseattle/SMIS
Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping
- 论文:https://yiranran.github.io/files/CVPR2020_Unpaired%20Portrait%20Drawing%20Generation%20via%20Asymmetric%20Cycle%20Mapping.pdf
- 代码:https://github.com/yiranran/Unpaired-Portrait-Drawing
Learning to Cartoonize Using White-box Cartoon Representations
-
论文:https://github.com/SystemErrorWang/White-box-Cartoonization/blob/master/paper/06791.pdf
-
主页:https://systemerrorwang.github.io/White-box-Cartoonization/
-
代码:https://github.com/SystemErrorWang/White-box-Cartoonization
-
解读:https://zhuanlan.zhihu.com/p/117422157
-
Demo视频:https://www.bilibili.com/video/av56708333
GAN Compression: Efficient Architectures for Interactive Conditional GANs
-
论文:https://arxiv.org/abs/2003.08936
-
代码:https://github.com/mit-han-lab/gan-compression
Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions
- 论文:https://arxiv.org/abs/2003.01826
- 代码:https://github.com/cc-hpc-itwm/UpConv
Re-ID
High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Wang_High-Order_Information_Matters_Learning_Relation_and_Topology_for_Occluded_Person_CVPR_2020_paper.html
- 代码:https://github.com/wangguanan/HOReID
COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification
-
论文:https://arxiv.org/abs/2005.07862
-
数据集:暂无
Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking
-
论文:https://arxiv.org/abs/2004.04199
-
代码:https://github.com/whj363636/Adversarial-attack-on-Person-ReID-With-Deep-Mis-Ranking
Pose-guided Visible Part Matching for Occluded Person ReID
- 论文:https://arxiv.org/abs/2004.00230
- 代码:https://github.com/hh23333/PVPM
Weakly supervised discriminative feature learning with state information for person identification
- 论文:https://arxiv.org/abs/2002.11939
- 代码:https://github.com/KovenYu/state-information
3D点云(分类/分割/配准等)
3D点云卷积
PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling
- 论文:https://arxiv.org/abs/2003.00492
- 代码:https://github.com/yanx27/PointASNL
Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds
-
论文下载链接:https://arxiv.org/abs/2003.12971
-
代码:https://github.com/raoyongming/PointGLR
Grid-GCN for Fast and Scalable Point Cloud Learning
-
论文:https://arxiv.org/abs/1912.02984
-
代码:https://github.com/Xharlie/Grid-GCN
FPConv: Learning Local Flattening for Point Convolution
- 论文:https://arxiv.org/abs/2002.10701
- 代码:https://github.com/lyqun/FPConv
3D点云分类
PointAugment: an Auto-Augmentation Framework for Point Cloud Classification
- 论文:https://arxiv.org/abs/2002.10876
- 代码(即将开源): https://github.com/liruihui/PointAugment/
3D点云语义分割
RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds
-
论文:https://arxiv.org/abs/1911.11236
-
代码:https://github.com/QingyongHu/RandLA-Net
-
解读:https://zhuanlan.zhihu.com/p/105433460
Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels
-
论文:https://arxiv.org/abs/2004.04091
-
代码:https://github.com/alex-xun-xu/WeakSupPointCloudSeg
PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation
- 论文:https://arxiv.org/abs/2003.14032
- 代码:https://github.com/edwardzhou130/PolarSeg
Learning to Segment 3D Point Clouds in 2D Image Space
-
论文:https://arxiv.org/abs/2003.05593
-
代码:https://github.com/WPI-VISLab/Learning-to-Segment-3D-Point-Clouds-in-2D-Image-Space
3D点云实例分割
PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
- 论文:https://arxiv.org/abs/2004.01658
- 代码:https://github.com/Jia-Research-Lab/PointGroup
3D点云配准
Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences
- 论文:https://arxiv.org/abs/2005.01014
- 代码:https://github.com/XiaoshuiHuang/fmr
D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features
- 论文:https://arxiv.org/abs/2003.03164
- 代码:https://github.com/XuyangBai/D3Feat
RPM-Net: Robust Point Matching using Learned Features
- 论文:https://arxiv.org/abs/2003.13479
- 代码:https://github.com/yewzijian/RPMNet
3D点云补全
Cascaded Refinement Network for Point Cloud Completion
- 论文:https://arxiv.org/abs/2004.03327
- 代码:https://github.com/xiaogangw/cascaded-point-completion
3D点云目标跟踪
P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds
- 论文:https://arxiv.org/abs/2005.13888
- 代码:https://github.com/HaozheQi/P2B
其他
An Efficient PointLSTM for Point Clouds Based Gesture Recognition
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html
- 代码:https://github.com/Blueprintf/pointlstm-gesture-recognition-pytorch
人脸
人脸识别
CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition
-
论文:https://arxiv.org/abs/2004.00288
-
代码:https://github.com/HuangYG123/CurricularFace
Learning Meta Face Recognition in Unseen Domains
- 论文:https://arxiv.org/abs/2003.07733
- 代码:https://github.com/cleardusk/MFR
- 解读:https://mp.weixin.qq.com/s/YZoEnjpnlvb90qSI3xdJqQ
人脸检测
人脸活体检测
Searching Central Difference Convolutional Networks for Face Anti-Spoofing
-
论文:https://arxiv.org/abs/2003.04092
-
代码:https://github.com/ZitongYu/CDCN
人脸表情识别
Suppressing Uncertainties for Large-Scale Facial Expression Recognition
-
论文:https://arxiv.org/abs/2002.10392
-
代码(即将开源):https://github.com/kaiwang960112/Self-Cure-Network
人脸转正
Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images
- 论文:https://arxiv.org/abs/2003.08124
- 代码:https://github.com/Hangz-nju-cuhk/Rotate-and-Render
人脸3D重建
AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"
- 论文:https://arxiv.org/abs/2003.13845
- 数据集:https://github.com/lattas/AvatarMe
FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction
- 论文:https://arxiv.org/abs/2003.13989
- 代码:https://github.com/zhuhao-nju/facescape
人体姿态估计(2D/3D)
2D人体姿态估计
TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting
-
主页:https://yzhq97.github.io/transmomo/
-
论文:https://arxiv.org/abs/2003.14401
-
代码:https://github.com/yzhq97/transmomo.pytorch
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
- 论文:https://arxiv.org/abs/1908.10357
- 代码:https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation
The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation
- 论文:https://arxiv.org/abs/1911.07524
- 代码:https://github.com/HuangJunJie2017/UDP-Pose
- 解读:https://zhuanlan.zhihu.com/p/92525039
Distribution-Aware Coordinate Representation for Human Pose Estimation
-
主页:https://ilovepose.github.io/coco/
-
论文:https://arxiv.org/abs/1910.06278
-
代码:https://github.com/ilovepose/DarkPose
3D人体姿态估计
Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data
- 论文:https://arxiv.org/abs/2006.07778
- 代码:https://github.com/Nicholasli1995/EvoSkeleton
Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach
-
主页:https://www.zhe-zhang.com/cvpr2020
-
论文:https://arxiv.org/abs/2003.11163
-
代码:https://github.com/CHUNYUWANG/imu-human-pose-pytorch
Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data
-
论文下载链接:https://arxiv.org/abs/2004.01166
-
代码:https://github.com/Healthcare-Robotics/bodies-at-rest
-
数据集:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML
Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis
- 主页:http://val.cds.iisc.ac.in/pgp-human/
- 论文:https://arxiv.org/abs/2004.04400
Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
- 论文:https://arxiv.org/abs/2004.00329
- 代码:https://github.com/fabbrimatteo/LoCO
VIBE: Video Inference for Human Body Pose and Shape Estimation
- 论文:https://arxiv.org/abs/1912.05656
- 代码:https://github.com/mkocabas/VIBE
Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation
- 论文:https://arxiv.org/abs/2002.11251
- 代码:https://github.com/vnmr/JointVideoPose3D
Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS
- 论文:https://arxiv.org/abs/2003.03972
- 数据集:暂无
人体解析
Correlating Edge, Pose with Parsing
-
论文:https://arxiv.org/abs/2005.01431
-
代码:https://github.com/ziwei-zh/CorrPM
场景文本检测
STEFANN: Scene Text Editor using Font Adaptive Neural Network
-
主页:https://prasunroy.github.io/stefann/
-
论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html
-
代码:https://github.com/prasunroy/stefann
-
数据集:https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k
ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_ContourNet_Taking_a_Further_Step_Toward_Accurate_Arbitrary-Shaped_Scene_Text_CVPR_2020_paper.pdf
- 代码:https://github.com/wangyuxin87/ContourNet
UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World
- 论文:https://arxiv.org/abs/2003.10608
- 代码和数据集:https://github.com/Jyouhou/UnrealText/
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
- 论文:https://arxiv.org/abs/2002.10200
- 代码(即将开源):https://github.com/Yuliang-Liu/bezier_curve_text_spotting
- 代码(即将开源):https://github.com/aim-uofa/adet
Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection
-
论文:https://arxiv.org/abs/2003.07493
-
代码:https://github.com/GXYM/DRRG
场景文本识别
SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition
- 论文:https://arxiv.org/abs/2005.10977
- 代码:https://github.com/Pay20Y/SEED
UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World
- 论文:https://arxiv.org/abs/2003.10608
- 代码和数据集:https://github.com/Jyouhou/UnrealText/
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
- 论文:https://arxiv.org/abs/2002.10200
- 代码(即将开源):https://github.com/aim-uofa/adet
Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition
-
论文:https://arxiv.org/abs/2003.06606
-
代码:https://github.com/Canjie-Luo/Text-Image-Augmentation
特征(点)检测和描述
SuperGlue: Learning Feature Matching with Graph Neural Networks
- 论文:https://arxiv.org/abs/1911.11763
- 代码:https://github.com/magicleap/SuperGluePretrainedNetwork
超分辨率
图像超分辨率
Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Guo_Closed-Loop_Matters_Dual_Regression_Networks_for_Single_Image_Super-Resolution_CVPR_2020_paper.html
- 代码:https://github.com/guoyongcs/DRN
Learning Texture Transformer Network for Image Super-Resolution
-
论文:https://arxiv.org/abs/2006.04139
-
代码:https://github.com/FuzhiYang/TTSR
Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining
- 论文:https://arxiv.org/abs/2006.01424
- 代码:https://github.com/SHI-Labs/Cross-Scale-Non-Local-Attention
Structure-Preserving Super Resolution with Gradient Guidance
-
论文:https://arxiv.org/abs/2003.13081
-
代码:https://github.com/Maclory/SPSR
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy
论文:https://arxiv.org/abs/2004.00448
代码:https://github.com/clovaai/cutblur
视频超分辨率
TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution
- 论文:https://arxiv.org/abs/1812.02898
- 代码:https://github.com/YapengTian/TDAN-VSR-CVPR-2020
Space-Time-Aware Multi-Resolution Video Enhancement
- 主页:https://alterzero.github.io/projects/STAR.html
- 论文:http://arxiv.org/abs/2003.13170
- 代码:https://github.com/alterzero/STARnet
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution
- 论文:https://arxiv.org/abs/2002.11616
- 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020
模型压缩/剪枝
DMCP: Differentiable Markov Channel Pruning for Neural Networks
- 论文:https://arxiv.org/abs/2005.03354
- 代码:https://github.com/zx55/dmcp
Forward and Backward Information Retention for Accurate Binary Neural Networks
-
论文:https://arxiv.org/abs/1909.10788
-
代码:https://github.com/htqin/IR-Net
Towards Efficient Model Compression via Learned Global Ranking
- 论文:https://arxiv.org/abs/1904.12368
- 代码:https://github.com/cmu-enyac/LeGR
HRank: Filter Pruning using High-Rank Feature Map
- 论文:http://arxiv.org/abs/2002.10179
- 代码:https://github.com/lmbxmu/HRank
GAN Compression: Efficient Architectures for Interactive Conditional GANs
-
论文:https://arxiv.org/abs/2003.08936
-
代码:https://github.com/mit-han-lab/gan-compression
Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression
-
论文:https://arxiv.org/abs/2003.08935
-
代码:https://github.com/ofsoundof/group_sparsity
视频理解/行为识别
Oops! Predicting Unintentional Action in Video
-
主页:https://oops.cs.columbia.edu/
-
论文:https://arxiv.org/abs/1911.11206
-
代码:https://github.com/cvlab-columbia/oops
-
数据集:https://oops.cs.columbia.edu/data
PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition
- 论文:https://arxiv.org/abs/1911.12409
- 代码:https://github.com/shlizee/Predict-Cluster
Intra- and Inter-Action Understanding via Temporal Action Parsing
- 论文:https://arxiv.org/abs/2005.10229
- 主页和数据集:https://sdolivia.github.io/TAPOS/
3DV: 3D Dynamic Voxel for Action Recognition in Depth Video
- 论文:https://arxiv.org/abs/2005.05501
- 代码:https://github.com/3huo/3DV-Action
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
- 主页:https://sdolivia.github.io/FineGym/
- 论文:https://arxiv.org/abs/2004.06704
TEA: Temporal Excitation and Aggregation for Action Recognition
-
论文:https://arxiv.org/abs/2004.01398
-
代码:https://github.com/Phoenix1327/tea-action-recognition
X3D: Expanding Architectures for Efficient Video Recognition
-
论文:https://arxiv.org/abs/2004.04730
-
代码:https://github.com/facebookresearch/SlowFast
Temporal Pyramid Network for Action Recognition
-
主页:https://decisionforce.github.io/TPN
-
论文:https://arxiv.org/abs/2004.03548
-
代码:https://github.com/decisionforce/TPN
基于骨架的动作识别
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition
- 论文:https://arxiv.org/abs/2003.14111
- 代码:https://github.com/kenziyuliu/ms-g3d
人群计数
深度估计
BiFuse: Monocular 360◦ Depth Estimation via Bi-Projection Fusion
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_BiFuse_Monocular_360_Depth_Estimation_via_Bi-Projection_Fusion_CVPR_2020_paper.pdf
- 代码:https://github.com/Yeh-yu-hsuan/BiFuse
Focus on defocus: bridging the synthetic to real domain gap for depth estimation
- 论文:https://arxiv.org/abs/2005.09623
- 代码:https://github.com/dvl-tum/defocus-net
Bi3D: Stereo Depth Estimation via Binary Classifications
-
论文:https://arxiv.org/abs/2005.07274
-
代码:https://github.com/NVlabs/Bi3D
AANet: Adaptive Aggregation Network for Efficient Stereo Matching
- 论文:https://arxiv.org/abs/2004.09548
- 代码:https://github.com/haofeixu/aanet
Towards Better Generalization: Joint Depth-Pose Learning without PoseNet
-
论文:https://github.com/B1ueber2y/TrianFlow
-
代码:https://github.com/B1ueber2y/TrianFlow
单目深度估计
On the uncertainty of self-supervised monocular depth estimation
- 论文:https://arxiv.org/abs/2005.06209
- 代码:https://github.com/mattpoggi/mono-uncertainty
3D Packing for Self-Supervised Monocular Depth Estimation
- 论文:https://arxiv.org/abs/1905.02693
- 代码:https://github.com/TRI-ML/packnet-sfm
- Demo视频:https://www.bilibili.com/video/av70562892/
Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation
- 论文:https://arxiv.org/abs/2002.12114
- 代码:https://github.com/yzhao520/ARC
6D目标姿态估计
PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/He_PVN3D_A_Deep_Point-Wise_3D_Keypoints_Voting_Network_for_6DoF_CVPR_2020_paper.pdf
- 代码:https://github.com/ethnhe/PVN3D
MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion
- 论文:https://arxiv.org/abs/2004.04336
- 代码:https://github.com/wkentaro/morefusion
EPOS: Estimating 6D Pose of Objects with Symmetries
主页:http://cmp.felk.cvut.cz/epos
论文:https://arxiv.org/abs/2004.00605
G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features
-
论文:https://arxiv.org/abs/2003.11089
-
代码:https://github.com/DC1991/G2L_Net
手势估计
HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation
-
论文:https://arxiv.org/abs/2004.00060
-
主页:http://vision.sice.indiana.edu/projects/hopenet
Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data
-
论文:https://arxiv.org/abs/2003.09572
-
代码:https://github.com/CalciferZh/minimal-hand
显著性检测
JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection
-
论文:https://arxiv.org/abs/2004.08515
-
代码:https://github.com/kerenfu/JLDCF/
UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders
-
主页:http://dpfan.net/d3netbenchmark/
-
论文:https://arxiv.org/abs/2004.05763
-
代码:https://github.com/JingZhang617/UCNet
去噪
A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising
-
论文:https://arxiv.org/abs/2003.12751
-
代码:https://github.com/Vandermode/NoiseModel
CycleISP: Real Image Restoration via Improved Data Synthesis
-
论文:https://arxiv.org/abs/2003.07761
-
代码:https://github.com/swz30/CycleISP
去雨
Multi-Scale Progressive Fusion Network for Single Image Deraining
- 论文:https://arxiv.org/abs/2003.10985
- 代码:https://github.com/kuihua/MSPFN
Detail-recovery Image Deraining via Context Aggregation Networks
- 论文:https://openaccess.thecvf.com/content_CVPR_2020/html/Deng_Detail-recovery_Image_Deraining_via_Context_Aggregation_Networks_CVPR_2020_paper.html
- 代码:https://github.com/Dengsgithub/DRD-Net
去模糊
视频去模糊
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior
- 主页:https://csbhr.github.io/projects/cdvd-tsp/index.html
- 论文:https://arxiv.org/abs/2004.02501
- 代码:https://github.com/csbhr/CDVD-TSP
去雾
Domain Adaptation for Image Dehazing
-
论文:https://arxiv.org/abs/2005.04668
-
代码:https://github.com/HUSTSYJ/DA_dahazing
Multi-Scale Boosted Dehazing Network with Dense Feature Fusion
-
论文:https://arxiv.org/abs/2004.13388
-
代码:https://github.com/BookerDeWitt/MSBDN-DFF
特征点检测与描述
ASLFeat: Learning Local Features of Accurate Shape and Localization
-
论文:https://arxiv.org/abs/2003.10071
-
代码:https://github.com/lzx551402/aslfeat
视觉问答(VQA)
VC R-CNN:Visual Commonsense R-CNN
- 论文:https://arxiv.org/abs/2002.12204
- 代码:https://github.com/Wangt-CN/VC-R-CNN
视频问答(VideoQA)
Hierarchical Conditional Relation Networks for Video Question Answering
- 论文:https://arxiv.org/abs/2002.10698
- 代码:https://github.com/thaolmk54/hcrn-videoqa
视觉语言导航
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
- 论文:https://arxiv.org/abs/2002.10638
- 代码(即将开源):https://github.com/weituo12321/PREVALENT
视频压缩
Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement
- 论文:https://arxiv.org/abs/2003.01966
- 代码:https://github.com/RenYang-home/HLVC
视频插帧
AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation
- 论文:https://arxiv.org/abs/1907.10244
- 代码:https://github.com/HyeongminLEE/AdaCoF-pytorch
FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation
-
论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Gui_FeatureFlow_Robust_Video_Interpolation_via_Structure-to-Texture_Generation_CVPR_2020_paper.html
-
代码:https://github.com/CM-BF/FeatureFlow
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution
- 论文:https://arxiv.org/abs/2002.11616
- 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020
Space-Time-Aware Multi-Resolution Video Enhancement
- 主页:https://alterzero.github.io/projects/STAR.html
- 论文:http://arxiv.org/abs/2003.13170
- 代码:https://github.com/alterzero/STARnet
Scene-Adaptive Video Frame Interpolation via Meta-Learning
- 论文:https://arxiv.org/abs/2004.00779
- 代码:https://github.com/myungsub/meta-interpolation
Softmax Splatting for Video Frame Interpolation
- 主页:http://sniklaus.com/papers/softsplat
- 论文:https://arxiv.org/abs/2003.05534
- 代码:https://github.com/sniklaus/softmax-splatting
风格迁移
Diversified Arbitrary Style Transfer via Deep Feature Perturbation
- 论文:https://arxiv.org/abs/1909.08223
- 代码:https://github.com/EndyWon/Deep-Feature-Perturbation
Collaborative Distillation for Ultra-Resolution Universal Style Transfer
-
论文:https://arxiv.org/abs/2003.08436
-
代码:https://github.com/mingsun-tse/collaborative-distillation
车道线检测
Inter-Region Affinity Distillation for Road Marking Segmentation
- 论文:https://arxiv.org/abs/2004.05304
- 代码:https://github.com/cardwing/Codes-for-IntRA-KD
"人-物"交互(HOT)检测
PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection
- 论文:https://arxiv.org/abs/1912.12898
- 代码:https://github.com/YueLiao/PPDM
Detailed 2D-3D Joint Representation for Human-Object Interaction
-
论文:https://arxiv.org/abs/2004.08154
-
代码:https://github.com/DirtyHarryLYL/DJ-RN
Cascaded Human-Object Interaction Recognition
-
论文:https://arxiv.org/abs/2003.04262
-
代码:https://github.com/tfzhou/C-HOI
VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions
- 论文:https://arxiv.org/abs/2003.05541
- 代码:https://github.com/ASMIftekhar/VSGNet
轨迹预测
The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction
- 论文:https://arxiv.org/abs/1912.06445
- 代码:https://github.com/JunweiLiang/Multiverse
- 数据集:https://next.cs.cmu.edu/multiverse/
Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction
- 论文:https://arxiv.org/abs/2002.11927
- 代码:https://github.com/abduallahmohamed/Social-STGCNN
运动预测
Collaborative Motion Prediction via Neural Motion Message Passing
- 论文:https://arxiv.org/abs/2003.06594
- 代码:https://github.com/PhyllisH/NMMP
MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps
-
论文:https://arxiv.org/abs/2003.06754
-
代码:https://github.com/pxiangwu/MotionNet
光流估计
Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation
- 论文:https://arxiv.org/abs/2003.13045
- 代码:https://github.com/lliuz/ARFlow
图像检索
Evade Deep Image Retrieval by Stashing Private Images in the Hash Space
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Xiao_Evade_Deep_Image_Retrieval_by_Stashing_Private_Images_in_the_CVPR_2020_paper.html
- 代码:https://github.com/sugarruy/hashstash
虚拟试衣
Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content
- 论文:https://arxiv.org/abs/2003.05863
- 代码:https://github.com/switchablenorms/DeepFashion_Try_On
HDR
Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline
-
主页:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR
-
论文下载链接:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR_/00942.pdf
-
代码:https://github.com/alex04072000/SingleHDR
对抗样本
Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction
- 论文:https://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Enhancing_Cross-Task_Black-Box_Transferability_of_Adversarial_Examples_With_Dispersion_Reduction_CVPR_2020_paper.pdf
- 代码:https://github.com/erbloo/dr_cvpr20
Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance
- 论文:https://arxiv.org/abs/1911.02466
- 代码:https://github.com/ZhengyuZhao/PerC-Adversarial
三维重建
Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild
- CVPR 2020 Best Paper
- 主页:https://elliottwu.com/projects/unsup3d/
- 论文:https://arxiv.org/abs/1911.11130
- 代码:https://github.com/elliottwu/unsup3d
Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
-
主页:https://shunsukesaito.github.io/PIFuHD/
-
论文:https://arxiv.org/abs/2004.00452
-
代码:https://github.com/facebookresearch/pifuhd
-
论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf
-
代码:https://github.com/chaitanya100100/TailorNet
-
数据集:https://github.com/zycliao/TailorNet_dataset
Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion
-
论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Chibane_Implicit_Functions_in_Feature_Space_for_3D_Shape_Reconstruction_and_CVPR_2020_paper.pdf
-
代码:https://github.com/jchibane/if-net
-
论文:http://openaccess.thecvf.com/conte
以上是关于[CVPR2020]论文翻译SwapText: Image Based Texts Transfer in Scenes的主要内容,如果未能解决你的问题,请参考以下文章
CVPR-2020 AAAI2020 CVPR-2019 NIPS-2019 ICCV-2019 IJCAI-2019 论文超级大合集下载,整理好累,拿走不谢