āŖļøč®ŗęäøč½½ļ¼
ICCV2021 č®ŗęäøč½½ę±ę»ļ¼
é¾ę„: https://pan.baidu.com/s/1vmOQzLG1QaBCgQD1ijtYuw
ęåē : bp9j ļ¼č§£ååÆē ļ¼čē³»å¾®äæ” nvshenj125 č·åļ¼
CVPR 2021ę“ēļ¼https://github.com/DWCTOD/CVPR2021-Papers-with-Code-Demo
č®ŗęäøč½½ļ¼https://pan.baidu.com/share/init?surl=gjfUQlPf73MCk4vM8VbzoA
åÆē ļ¼aicv
š ICCV 2021ęē»ę“ę°ęę°č®ŗę/paperåēøåŗēå¼ęŗ代ē /codeļ¼
š ICCV 2021 ę¶å½åč”Ø
šICCV 2021 ę„åådemoč§é¢ę±ę» https://space.bilibili.com/288489574
š å®ē½é¾ę„ļ¼http://iccv2021.thecvf.com/home
ā²ļø ę¶é“ ā č®ŗę/paperę„ę¶å ¬åøę¶é“ļ¼2021幓7ę23ę„
ā āę³Øļ¼ę¬¢čæåä½å¤§ä½¬ęäŗ¤issueļ¼åäŗ«ICCV 2021č®ŗę/paperåå¼ęŗ锹ē®ļ¼å ±åå®åčæäøŖ锹ē®
āļø äøŗäŗę¹ä¾æäøč½½ļ¼å·²å°č®ŗę/paperååØåØę件夹äø āļø č”Øē¤ŗč®ŗę/paperå·²äøč½½ / Paper Download
ICCV 2021 č®ŗę/paperäŗ¤ęµē¾¤å·²ęē«ļ¼å·²ē»ę¶å½ēåå¦ļ¼åÆ仄귻å å¾®äæ”ļ¼nvshenj125ļ¼čÆ·å¤ę³Øļ¼ICCV+å§å+å¦ę ”/å ¬åøåē§°ļ¼äøå®č¦ę ¹ę®ę ¼å¼ē³čÆ·ļ¼åÆ仄ęä½ čæē¾¤ć
- Backbone
- Dataset
- Loss
- NAS
- Image Classification
- Vision Transformer
- ē®ę ę£ęµ/Object Detection
- ę¾čę§ę£ęµ/Salient Object Detection
- 3Dē®ę ę£ęµ / 3D Object Detection
- ē®ę č·čøŖ / Object Tracking
- Image Semantic Segmentation
- Semantic Scene Segmentation
- 3D Semantic Segmentation
- 3D Instance Segmentation
- å®ä¾åå²/Instance Segmentation
- č§é¢åå² / video semantic segmentation
- å»å¦å¾ååå²/ Medical Image Segmentation
- å»å¦å¾ååę/Medical Image Analysis
- GAN
- Style Transfer
- ē»ē²åŗ¦åē±»/Fine-Grained Visual Categorization
- Multi-Label Recognition
- Long-Tailed Recognition
- Geometric deep learning
- Zero/Few Shot
- Unsupervised
- Self-supervised
- Semi Supervised
- Weakly Supervised
- Active Learning
- Action Detection
- åØä½čÆå«/Action Recognition
- ę¶åŗč”äøŗę£ęµ / Temporal Action Localization
- ęčÆčÆå«/Sign Language Recognition
- Hand Pose Estimation
- Pose Estimation
- 6D Object Pose Estimation
- Human Reconstruction
- 3D Scene Understanding
- äŗŗčøčÆå«/Face Recognition
- äŗŗčøåƹé½/Face Alignment
- äŗŗčøē¼č¾/Facial Editing
- Face Reconstruction
- Facial Expression Recognition
- č”äŗŗéčÆå«/Re-Identification
- Vehicle Re-identification
- Pedestrian Detection
- äŗŗē¾¤č®”ę° / Crowd Counting
- Motion Forecasting
- Pedestrian Trajectory Prediction
- Face-Anti-spoofing
- deepfake
- åƹęę»å»/ Adversarial Attacks
- č·ØęØ”ęę£ē“¢/Cross-Modal Retrieval
- ē„ē»č¾å°åŗ/NeRF
- é“å½±å»é¤/Shadow Removal
- å¾åę£ē“¢/Image Retrieval
- č¶ åč¾Ø/Super-Resolution
- å¾åéå»ŗ/Image Reconstruction
- å»ęØ”ē³/Image Deblurring
- å»åŖ/Image Denoising
- å»éŖ/Image Desnowing
- å¾åå¢å¼ŗ/Image Enhancement
- å¾åå¹é /Image Matching
- å¾åč“Øé/Image Quality
- å¾ååē¼©/Image Compression
- å¾åå¤å/Image Restoration
- å¾åäæ®å¤/Image Inpainting
- č§é¢äæ®å¤/Video Inpainting
- č§é¢ęåø§/Video Frame Interpolation
- Video Reasoning
- Video Recognition
- Visual Question Answering
- Matching
- äŗŗęŗäŗ¤äŗ/Hand-object Interaction
- č§ēŗæä¼°č®” / Gaze Estimation
- ę·±åŗ¦ä¼°č®” / Depth Estimation
- Contrastive-Learning
- Graph Convolution Networks
- ęØ”ååē¼©/Compress
- Quantization
- Knowledge Distillation
- ē¹äŗ/Point Cloud
- 3D reconstruction
- åä½ēę/Font Generation
- ęę¬ę£ęµ / Text Detection
- ęę¬čÆå« / Text Recognition
- Scene Text Recognizer
- Autonomous-Driving
- Visdrone_detection
- å¼åøøę£ęµ / Anomaly Detection
- å ¶ä»/Others
āļøConformer: Local Features Coupling Global Representations for Visual Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2105.03889
- 代ē /codeļ¼https://github.com/pengzhiliang/Conformer
Contextual Convolutional Neural Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07387
- 代ē /codeļ¼https://github.com/iduta/coconv
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
-
č®ŗę/paperļ¼https://arxiv.org/abs/2102.12122
-
代ē /codeļ¼https://github.com/whai362/PVT
Reg-IBP: Efficient and Scalable Neural Network Robustness Training via Interval Bound Propagation
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/harrywuhust2022/Reg_IBP_ICCV2021
Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
- č®ŗę/paperļ¼https://arxiv.org/abs/2105.02498
- 代ē /codeļ¼https://github.com/KingJamesSong/DifferentiableSVD
Beyond Road Extraction: A Dataset for Map Update using Aerial Images
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.04690
- 代ē /codeļ¼ None
āļøFineAction: A Fined Video Dataset for Temporal Action Localization
- č®ŗę/paperļ¼https://arxiv.org/abs/2105.11107 | äø»é”µ/Homepage
- 代ē /codeļ¼ None
KoDF: A Large-scale Korean DeepFake Detection Dataset
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.10094
- 代ē /codeļ¼https://moneybrain-research.github.io/kodf
LLVIP: A Visible-infrared Paired Dataset for Low-light Vision
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10831 | äø»é”µ/Homepage
- 代ē /codeļ¼ None
Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.03585
- 代ē /codeļ¼ None
Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10840 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/bupt-ai-cz/Meta-SelfLearning
āļøMultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
- č®ŗę/paperļ¼https://arxiv.org/abs/2105.07404 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/MCG-NJU/MultiSports/
Semantically Coherent Out-of-Distribution Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11941 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/jingkang50/ICCV21_SCOOD
StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.10115
- 代ē /codeļ¼None
STRIVE: Scene Text Replacement In Videos
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02762 | äø»é”µ/Homepage
- 代ē /codeļ¼None
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
- č®ŗę/paperļ¼https://arxiv.org/abs/2006.16241
- 代ē /codeļ¼https://github.com/hendrycks/imagenet-r
Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02399
- 代ē /codeļ¼https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
Who's Waldo? Linking People Across Text and Images (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07253
- 代ē /codeļ¼None
Asymmetric Loss For Multi-Label Classification
- č®ŗę/paperļ¼https://arxiv.org/abs/2009.14119
- 代ē /codeļ¼https://github.com/Alibaba-MIIL/ASL
Bias Loss for Mobile Neural Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.11170
- 代ē /codeļ¼None
Focal Frequency Loss for Image Reconstruction and Synthesis
- č®ŗę/paperļ¼https://arxiv.org/abs/2012.12821
- 代ē /codeļ¼https://github.com/EndlessSora/focal-frequency-loss
Orthogonal Projection Loss
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.14021
- 代ē /codeļ¼https://github.com/kahnchana/opl
Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.11669
- 代ē /codeļ¼https://github.com/kemaloksuz/RankSortLoss
BN-NAS: Neural Architecture Search with Batch Normalization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07375
- 代ē /codeļ¼None
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
- č®ŗę/paperļ¼https://arxiv.org/pdf/2103.12424.pdf
- 代ē /codeļ¼https://github.com/changlin31/BossNAS
CONet: Channel Optimization for Convolutional Neural Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06822
- 代ē /codeļ¼None
FOX-NAS: Fast, On-device and Explainable Neural Architecture Search
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08189
- 代ē /codeļ¼https://github.com/great8nctu/FOX-NAS
Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09671v1
- 代ē /codeļ¼https://github.com/Ernie1/Pi-NAS
RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08019
- 代ē /codeļ¼https://github.com/ruocwang
Single-DARTS: Towards Stable Architecture Search
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08128
- 代ē /codeļ¼https://github.com/PencilAndBike/Single-DARTS.git
Influence-Balanced Loss for Imbalanced Visual Classification
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.02444
- 代ē /codeļ¼None
Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.05720
- 代ē /codeļ¼None
Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.13122
- 代ē /codeļ¼None
An End-to-End Transformer Model for 3D Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.08141
- 代ē /codeļ¼None
AutoFormer: Searching Transformers for Visual Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.00651
- 代ē /codeļ¼https://github.com/microsoft/AutoML
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
- č®ŗę/paperļ¼https://arxiv.org/pdf/2103.12424.pdf
- 代ē /codeļ¼https://github.com/changlin31/BossNAS
Conditional DETR for Fast Training Convergence
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06152
- 代ē /codeļ¼https://git.io/ConditionalDETR
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.09487
- 代ē /codeļ¼None
Eformer: Edge Enhancement based Transformer for Medical Image Denoising
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.08044
- 代ē /codeļ¼None
Fast Convergence of DETR with Spatially Modulated Co-Attention
- č§£čÆ»ļ¼https://zhuanlan.zhihu.com/p/397083124
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02404
- 代ē /codeļ¼https://github.com/gaopengcuhk/SMCA-DETR
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.01912
- 代ē /codeļ¼None
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers (Oral)
- č®ŗę/paperļ¼https://arxiv.org/pdf/2103.15679.pdf
- 代ē /codeļ¼https://github.com/hila-chefer/Transformer-MM-Explainability
GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12630
- 代ē /codeļ¼https://github.com/xueyee/GroupFormer
HiFT: Hierarchical Feature Transformer for Aerial Tracking
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.00202
- 代ē /codeļ¼https://github.com/vision4robotics/HiFT
High-Fidelity Pluralistic Image Completion with Transformers
- č®ŗę/paperļ¼https://arxiv.org/pdf/2103.14031.pdf | äø»é”µ/Homepage
- 代ē /codeļ¼ https://github.com/raywzy/ICT
Improving 3D Object Detection with Channel-wise Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10723
- 代ē /codeļ¼None
Is it Time to Replace CNNs with Transformers for Medical Images?
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09038
- 代ē /codeļ¼None
Learning Spatio-Temporal Transformer for Visual Tracking
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.17154
- 代ē /codeļ¼https://github.com/researchmm/Stark
MUSIQ: Multi-scale Image Quality Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05997
- 代ē /codeļ¼None
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction (Oral)
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.03798
-
代ē /codeļ¼https://github.com/Huage001/PaintTransformer
PlaneTR: Structure-Guided Transformers for 3D Plane Recovery
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.13108
- 代ē /codeļ¼ https://github.com/IceTTTb/PlaneTR3D
PnP-DETR: Towards Efficient Visual Analysis with Transformers
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.07036
- 代ē /codeļ¼https://github.com/twangnh/pnp-detr
Pose Transformers (POTR): Human Motion Prediction with Non-Autoregressive Transformers
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.07531
- 代ē /codeļ¼None
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08839
- 代ē /codeļ¼https://github.com/yuxumin/PoinTr
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
-
č®ŗę/paperļ¼https://arxiv.org/abs/2102.12122
-
代ē /codeļ¼https://github.com/whai362/PVT
Rethinking and Improving Relative Position Encoding for Vision Transformer
- č®ŗę/paperļ¼https://houwenpeng.com/publications/iRPE.pdf
- 代ē /codeļ¼https://github.com/wkcn/iRPE-model-zoo
Rethinking Spatial Dimensions of Vision Transformers
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.16302
- 代ē /codeļ¼https://github.com/naver-ai/pit
Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03032
- 代ē /codeļ¼https://github.com/zhiheLu/CWTfor-FSS
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04444
- 代ē /codeļ¼https://github.com/AllenXiangX/SnowflakeNet
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
- č§£čÆ»ļ¼ēØäŗč§é¢åŗęÆå¾ēęēSpatial-Temporal Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.12309
- 代ē /codeļ¼None
SOTR: Segmenting Objects with Transformers
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06747
- 代ē /codeļ¼https://github.com/easton-cau/SOTR
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.14030
- 代ē /codeļ¼https://github.com/microsoft/Swin-Transformer
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers
- č®ŗę/paperļ¼https://arxiv.org/abs/2011.02910
- 代ē /codeļ¼https://github.com/mli0603/stereo-transformer
The Animation Transformer: Visual Correspondence via Segment Matching
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02614
- 代ē /codeļ¼None
The Right to Talk: An Audio-Visual Transformer Approach
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03256
- 代ē /codeļ¼None
TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11539
- 代ē /codeļ¼None
TransFER: Learning Relation-aware Facial Expression Representations with Transformers
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11116
- 代ē /codeļ¼None
TransPose: Keypoint Localization via Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2012.14214
- 代ē /codeļ¼https://github.com/yangsenius/TransPose
āļøTokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
-
č®ŗę/paperļ¼https://arxiv.org/abs/2101.11986
-
代ē /codeļ¼ https://github.com/yitu-opensource/T2T-ViT
āļøVisual Transformer with Statistical Test for COVID-19 Classification
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.05334
- 代ē /codeļ¼ None
Vision Transformer with Progressive Sampling
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.01684
- 代ē /codeļ¼https://github.com/yuexy/PS-ViT
Visual Saliency Transformer
-
č§£čÆ»ļ¼https://blog.csdn.net/qq_39936426/article/details/117199411
-
č®ŗę/paperļ¼https://arxiv.org/abs/2104.12099
-
代ē /codeļ¼ https://github.com/nnizhang/VST
Vision-Language Transformer and Query Generation for Referring Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05565
- 代ē /codeļ¼https://github.com/henghuiding/Vision-Language-Transformer
Voxel Transformer for 3D Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02497
- 代ē /codeļ¼ None
Active Learning for Deep Object Detection via Probabilistic Modeling
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.16130
- 代ē /codeļ¼None
Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.01499
- 代ē /codeļ¼https://github.com/DongSky/lbba_boosted_wsod
Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07002
- 代ē /codeļ¼https://github.com/Z-Zheng/ChangeStar
Conditional Variational Capsule Network for Open Set Recognition
-
č®ŗę/paperļ¼ https://arxiv.org/abs/2104.09159
-
代ē /codeļ¼https://github.com/guglielmocamporese/cvaecaposr
DetCo: Unsupervised Contrastive Learning for Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2102.04803
- 代ē /codeļ¼ https://github.com/xieenze/DetCo
DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09017
- 代ē /codeļ¼None
Deployment of Deep Neural Networks for Object Detection on Edge AI Devices with Runtime Optimization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08166
- 代ē /codeļ¼None
Detecting Invisible People
- č®ŗę/paperļ¼https://arxiv.org/abs/2012.08419 | äø»é”µ/Homepage
- 代ē /codeļ¼None
FMODetect: Robust Detection and Trajectory Estimation of Fast Moving Objects
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/rozumden/FMODetect
GraphFPN: Graph Feature Pyramid Network for Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.00580
- 代ē /codeļ¼None
Human Detection and Segmentation via Multi-view Consensus
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/isinsukatircioglu/mvc
MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
- č®ŗę/paperļ¼https://arxiv.org/abs/2104.12763 | äø»é”µ/Homepage
- 代ē /codeļ¼ https://github.com/ashkamath/mdetr
Mutual Supervision for Dense Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.05986
- 代ē /codeļ¼None
Morphable Detector for Object Detection on Demand
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.04917
- 代ē /codeļ¼https://github.com/Zhaoxiangyun/Morphable-Detector
Moving Object Detection for Event-based vision using Graph Spectral Clustering
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.14979
- 代ē /codeļ¼None
Oriented R-CNN for Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05699
- 代ē /codeļ¼https://github.com/jbwang1997/OBBDetection
Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.11669
- 代ē /codeļ¼https://github.com/kemaloksuz/RankSortLoss
Reconcile Prediction Consistency for Balanced Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10809
- 代ē /codeļ¼None
Seeking Similarities over Differences: Similarity-based Domain Alignment for Adaptive Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.01428
- 代ē /codeļ¼None
Towards Rotation Invariance in Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.13488
- 代ē /codeļ¼None
TOOD: Task-aligned One-stage Object Detection (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07755
- 代ē /codeļ¼https://github.com/fcjian/TOOD
Vector-Decomposed Disentanglement for Domain-Invariant Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06685
- 代ē /codeļ¼None
Disentangled High Quality Salient Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03551
- 代ē /codeļ¼None
Light Field Saliency Detection with Dual Local Graph Learning andReciprocative Guidance
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.00698
- 代ē /codeļ¼None
RGB-D Saliency Detection via Cascaded Mutual Information Minimization
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.07246
- 代ē /codeļ¼https://github.com/JingZhang617/cascaded_rgbd_sod
Specificity-preserving RGB-D Saliency Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08162
- 代ē /codeļ¼https://github.com/taozh2017/SPNet
Summarize and Search: Learning Consensus-aware Dynamic Convolution for Co-Saliency Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.07246
- 代ē /codeļ¼https://github.com/nnizhang/CADC
An End-to-End Transformer Model for 3D Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.08141
- 代ē /codeļ¼None
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05249
- 代ē /codeļ¼https://github.com/MartinHahner/LiDAR_fog_sim
LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08258
- 代ē /codeļ¼None
MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.00464
- 代ē /codeļ¼None
Improving 3D Object Detection with Channel-wise Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10723
- 代ē /codeļ¼None
Is Pseudo-Lidar needed for Monocular 3D Object detection?
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06417
- 代ē /codeļ¼None
ODAM: Object Detection, Association, and Mapping using Posed RGB Video ļ¼Oralļ¼
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10165v1
- 代ē /codeļ¼None
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02499
- 代ē /codeļ¼None
RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07794
- 代ē /codeļ¼None
Voxel Transformer for 3D Object Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02497
- 代ē /codeļ¼ None
Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency
- č®ŗę/paperļ¼https://arxiv.org/pdf/2107.11355.pdf
- 代ē /codeļ¼None
DepthTrack : Unveiling the Power of RGBD Tracking
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.13962
- 代ē /codeļ¼None
Exploring Simple 3D Multi-Object Tracking for Autonomous Driving
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10312v1
- 代ē /codeļ¼None
Is First Person Vision Challenging for Object Tracking?
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.13665
- 代ē /codeļ¼None
Learning to Track Objects from Unlabeled Videos
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12711
- 代ē /codeļ¼https://github.com/VISION-SJTU/USOT
Learn to Match: Automatic Matching Network Design for Visual Tracking
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.00803
- 代ē /codeļ¼https://github.com/JudasDie/SOTS
Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10606
- 代ē /codeļ¼https://github.com/TimoK93/ApLift
Saliency-Associated Object Tracking
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03637
- 代ē /codeļ¼None
Video Annotation for Visual Tracking via Selection and Refinement
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03821
- 代ē /codeļ¼https://github.com/Daikenan/VASR
Complementary Patch for Weakly Supervised Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03852
- 代ē /codeļ¼None
Calibrated Adversarial Refinement for Stochastic Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2006.13144
- 代ē /codeļ¼https://github.com/EliasKassapis/CARSSS
Deep Metric Learning for Open World Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04562
- 代ē /codeļ¼None
Dual Path Learning for Domain Adaptation of Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06337
- 代ē /codeļ¼https://github.com/royee182/DPL
EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.09406
- 代ē /codeļ¼https://github.com/PaddlePaddle/PaddleSeg
Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02281
- 代ē /codeļ¼None
Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06536
- 代ē /codeļ¼None
Exploring Cross-Image Pixel Contrast for Semantic Segmentation ļ¼Oralļ¼
- č®ŗę/paperļ¼https://arxiv.org/abs/2101.11939
- 代ē /codeļ¼https://github.com/tfzhou/ContrastiveSeg
Enhanced Boundary Learning for Glass-like Object Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.15734
- 代ē /codeļ¼https://github.com/hehao13/EBLNet
From Contexts to Locality: Ultra-high Resolution Ie Segmentation via Locality-aware Contextual Correlation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02580
- 代ē /codeļ¼https://github.com/liqiokkk/FCtL
ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12382v1
- 代ē /codeļ¼None
Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11249
- 代ē /codeļ¼https://sites.google.com/view/sfdaseg
Labels4Free: Unsupervised Segmentation using StyleGAN
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.14968 | äø»é”µ/Homepage
- 代ē /codeļ¼None
LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05570
- 代ē /codeļ¼None
Learning Meta-class Memory for Few-Shot Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02958
- 代ē /codeļ¼None
Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.11787
- 代ē /codeļ¼https://github.com/xulianuwa/AuxSegNet
Mining Contextual Information Beyond Image for Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11819
- 代ē /codeļ¼None
Mining Latent Classes for Few-shot Segmentation(Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.15402
- 代ē /codeļ¼https://github.com/LiheYoung/MiningFSS
Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06962
- 代ē /codeļ¼None
Multi-Anchor Active Domain Adaptation for Semantic Segmentation (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08012
- 代ē /codeļ¼None
Personalized Image Semantic Segmentation
- č®ŗę/paperļ¼None
- 代ē /codeļ¼ https://github.com/zhangyuygss/PIS
Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09025
- 代ē /codeļ¼None
Pseudo-mask Matters inWeakly-supervised Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12995
- 代ē /codeļ¼https://github.com/Eli-YiLi/PMM
RECALL: Replay-based Continual Learning in Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03673
- 代ē /codeļ¼None
Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.11279
- 代ē /codeļ¼https://github.com/CVMI-Lab/DARS
Semantic Segmentation on VSPW Dataset through Aggregation of Transformer Models
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01316
- 代ē /codeļ¼None
Self-Regulation for Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09702v1
- 代ē /codeļ¼None
Semantic Concentration for Domain Adaptation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05720
- 代ē /codeļ¼None
ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10528
- 代ē /codeļ¼None
Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03032
- 代ē /codeļ¼https://github.com/zhiheLu/CWTfor-FSS
SOTR: Segmenting Objects with Transformers
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06747
- 代ē /codeļ¼https://github.com/easton-cau/SOTR
Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.11264v1
- 代ē /codeļ¼None
The Marine Debris Dataset for Forward-Looking Sonar Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06800
- 代ē /codeļ¼https://github.com/mvaldenegro/marine-debris-fls-datasets/
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals
- č®ŗę/paperļ¼https://arxiv.org/pdf/2102.06191.pdf
- 代ē /codeļ¼https://github.com/wvangansbeke/Unsupervised-Semantic-Segmentation
Weakly Supervised Temporal Anomaly Segmentation with Dynamic Time Warping
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06816
- 代ē /codeļ¼None
BiMaL: Bijective Maximum Likelihood Approach to Domain Adaptation in Semantic Scene Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03267
- 代ē /codeļ¼None
VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/hzykent/VMNet
Hierarchical Aggregation for 3D Instance Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02350
- 代ē /codeļ¼https://github.com/hustvl/HAIS
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07478
- 代ē /codeļ¼https://github.com/Gorilla-Lab-SCUT/SSTNet
CDNet: Centripetal Direction Network for Nuclear Instance Segmentation
-
č®ŗę/paperļ¼None
-
代ē /codeļ¼ https://github.com/2021-ICCV/CDNet
āļøCrossover Learning for Fast Online Video Instance Segmentation
-
č®ŗę/paperļ¼https://arxiv.org/abs/2104.05970
-
代ē /codeļ¼ https://github.com/hustvl/CrossVIS
āļøInstances as Queries
- č®ŗę/paperļ¼https://arxiv.org/abs/2105.01928
- 代ē /codeļ¼ https://github.com/hustvl/QueryInst
Instance Segmentation Challenge Track Technical Report, VIPriors Workshop at ICCV 2021: Task-Specific Copy-Paste Data Augmentation Method for Instance Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.00470
- 代ē /codeļ¼https://github.com/jahongir7174/VIP2021
Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.11669
- 代ē /codeļ¼https://github.com/kemaloksuz/RankSortLoss
Scaling up instance annotation via label propagation
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.02277
- 代ē /codeļ¼http://scaling-anno.csail.mit.edu/
Domain Adaptive Video Segmentation via Temporal Consistency Regularization
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.11004
- 代ē /codeļ¼https://github.com/Dayan-Guan/DA-VSN
Full-Duplex Strategy for Video Object Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03151 | äø»é”µ/homepage
- 代ē /codeļ¼https://github.com/GewelsJI/FSNet
Hierarchical Memory Matching Network for Video Object Segmentation
-
č®ŗę/paperļ¼https://arxiv.org/abs/2109.11404 | äø»é”µ/homepage
-
代ē /codeļ¼Hierarchical Memory Matching Network for Video Object Segmentation
Joint Inductive and Transductive Learning for Video Object Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03679
- 代ē /codeļ¼https://github.com/maoyunyao/JOINT
Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.00622
- 代ē /codeļ¼None
Uncertainty-aware GAN with Adaptive Loss for Robust MRI Image Enhancement
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.03343
- 代ē /codeļ¼None
Eformer: Edge Enhancement based Transformer for Medical Image Denoising
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.08044
- 代ē /codeļ¼None
Improving Tuberculosis (TB) Prediction using Synthetically Generated Computed Tomography (CT) Images
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.11480
- 代ē /codeļ¼None
Preservational Learning Improves Self-supervised Medical Image Models by Reconstructing Diverse Contexts
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.04379
- 代ē /codeļ¼https://github.com/Luchixiang/PCRL
Studying the Effects of Self-Attention for Medical Image Analysis
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01486
- 代ē /codeļ¼None
3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12958
- 代ē /codeļ¼https://nv-tlabs.github.io/3DStyleNet/
AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03647
- 代ē /codeļ¼https://github.com/Huage001/AdaAttN
Click to Move: Controlling Video Generation with Sparse Motion
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08815
- 代ē /codeļ¼https://github.com/PierfrancescoArdino/C2M
Collaging Class-specific GANs for Semantic Image Synthesis
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.04281
- 代ē /codeļ¼None
Disentangled Lifespan Face Synthesis
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02874 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/SenHe/DLFS
Dual Projection Generative Adversarial Networks for Conditional Image Generation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09016
- 代ē /codeļ¼None
EigenGAN: Layer-Wise Eigen-Learning for GANs
- č®ŗę/paperļ¼https://arxiv.org/pdf/2104.12476.pdf
- 代ē /codeļ¼https://github.com/LynnHo/EigenGAN-Tensorflow
GAN Inversion for Out-of-Range Images with Geometric Transformations
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08998
- 代ē /codeļ¼https://kkang831.github.io/publication/ICCV_2021_BDInvert/
Generative Models for Multi-Illumination Color Constancy
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.00863
- 代ē /codeļ¼None
Gradient Normalization for Generative Adversarial Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02235
- 代ē /codeļ¼None
Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08841
- 代ē /codeļ¼None
Image Synthesis via Semantic Composition
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.07053 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/dvlab-research/SCGAN
InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.13865
- 代ē /codeļ¼None
Learning to Diversify for Single Domain Generalization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11726
- 代ē /codeļ¼None
Manifold Matching via Deep Metric Learning for Generative Modeling
- č®ŗę/paperļ¼https://arxiv.org/abs/2106.10777
- 代ē /codeļ¼https://github.com/dzld00/pytorch-manifold-matching
Meta Gradient Adversarial Attack
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04204
- 代ē /codeļ¼None
Online Multi-Granularity Distillation for GAN Compression
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06908
- 代ē /codeļ¼https://github.com/bytedance/OMGD
Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07668
- 代ē /codeļ¼https://github.com/csyxwei/OroJaR
PixelSynth: Generating a 3D-Consistent Experience from a Single Image
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05892 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/crockwell/pixelsynth/
Robustness and Generalization via Generative Adversarial Training
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02765
- 代ē /codeļ¼None
SemIE: Semantically-Aware Image Extrapolation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.13702
- 代ē /codeļ¼https://semie-iccv.github.io/
SketchLattice: Latticed Representation for Sketch Manipulation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11636
- 代ē /codeļ¼None
Sketch Your Own GAN
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02774
- 代ē /codeļ¼https://github.com/PeterWang512/GANSketching
Target Adaptive Context Aggregation for Video Scene Graph Generation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08121
- 代ē /codeļ¼https://github.com/MCG-NJU/TRACE
Toward a Visual Concept Vocabulary for GAN Latent Space
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.04292
- 代ē /codeļ¼None
Toward Spatially Unbiased Generative Models
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.01285
- 代ē /codeļ¼None
Towards Vivid and Diverse Image Colorization with Generative Color Prior
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08826
- 代ē /codeļ¼None
Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.05055
- 代ē /codeļ¼None
Unaligned Image-to-Image Translation by Learning to Reweight
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.11736
- 代ē /codeļ¼None
Unconditional Scene Graph Generation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05884
- 代ē /codeļ¼None
Unsupervised Geodesic-preserved Generative Adversarial Networks for Unconstrained 3D Pose Transfer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07520
- 代ē /codeļ¼https://github.com/mikecheninoulu/Unsupervised_IEPGAN
Domain-Aware Universal Style Transfer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04441
- 代ē /codeļ¼None
Benchmark Platform for Ultra-Fine-Grained Visual Categorization BeyondHuman Performance
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/XiaohanYu-GU/Ultra-FGVC
Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02399
- 代ē /codeļ¼https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
Residual Attention: A Simple but Effective Method for Multi-Label Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02456
- 代ē /codeļ¼None
ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot Oral
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02385
- 代ē /codeļ¼https://github.com/jrcai/ACE
Manifold Matching via Deep Metric Learning for Generative Modeling
- č®ŗę/paperļ¼https://arxiv.org/abs/2106.10777
- 代ē /codeļ¼https://github.com/dzld00/pytorch-manifold-matching
Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/csyxwei/OroJaR
Binocular Mutual Learning for Improving Few-shot Classification
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12104v1
- 代ē /codeļ¼None
Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05028
- 代ē /codeļ¼None
Discriminative Region-based Multi-Label Zero-Shot Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05028
- 代ē /codeļ¼None
Domain Generalization via Gradient Surgery
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.01621
- 代ē /codeļ¼None
Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06536
- 代ē /codeļ¼None
Few-Shot Batch Incremental Road Object Detection via Detector Fusion
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08048
- 代ē /codeļ¼None
Field-Guide-Inspired Zero-Shot Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10967
- 代ē /codeļ¼None
Few-shot Visual Relationship Co-localization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11618
- 代ē /codeļ¼None
Generalized Source-free Domain Adaptation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.01614
- 代ē /codeļ¼https://github.com/Albert0147/G-SFDA
Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08165
- 代ē /codeļ¼None
Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.03909
- 代ē /codeļ¼https://github.com/baiksung/MeTAL
Meta Navigator: Search for a Good Adaptation Policy for Few-shot Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.05749
- 代ē /codeļ¼None
On the Importance of Distractors for Few-Shot Classification
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.09883
- 代ē /codeļ¼None
Relational Embedding for Few-Shot Classification
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09666v1
- 代ē /codeļ¼None
SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12517
- 代ē /codeļ¼None
Transductive Few-Shot Classification on the Oblique Manifold
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04009
- 代ē /codeļ¼None
Visual Domain Adaptation for Monocular Depth Estimation on Resource-Constrained Hardware
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02671
- 代ē /codeļ¼None
Adversarial Robustness for Unsupervised Domain Adaptation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.00946
- 代ē /codeļ¼None
Collaborative Unsupervised Visual Representation Learning from Decentralized Data
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06492
- 代ē /codeļ¼None
Instance Similarity Learning for Unsupervised Feature Representation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02721
- 代ē /codeļ¼https://github.com/ZiweiWangTHU/ISL
Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.01959
- 代ē /codeļ¼None
Unsupervised Dense Deformation Embedding Network for Template-Free Shape Correspondence
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11609
- 代ē /codeļ¼None
Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10860
- 代ē /codeļ¼https://github.com/VisionLearningGroup/SND
Digging into Uncertainty in Self-supervised Multi-view Stereo
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12966
- 代ē /codeļ¼None
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02183
- 代ē /codeļ¼None
Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06435
- 代ē /codeļ¼None
Improving Self-supervised Learning with Hardness-aware Dynamic Curriculum Learning: An Application to Digital Pathology
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07183
- 代ē /codeļ¼https://github.com/srinidhiPY/ICCVCDPATH2021-ID-8
Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10840 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/bupt-ai-cz/Meta-SelfLearning
Reducing Label Effort: Self-Supervised meets Active Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11458
- 代ē /codeļ¼None
Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12654
- 代ē /codeļ¼https://github.com/mengziyi64/CASSI-Self-Supervised
Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07954
- 代ē /codeļ¼None
Self-Supervised Video Representation Learning with Meta-Contrastive Network
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08426
- 代ē /codeļ¼None
SSH: A Self-Supervised Framework for Image Harmonization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06805
- 代ē /codeļ¼https://github.com/VITA-Group/SSHarmonization
Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05617
- 代ē /codeļ¼None
Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.13432
- 代ē /codeļ¼None
A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09897v1
- 代ē /codeļ¼None
Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06524
- 代ē /codeļ¼https://github.com/LeonHLJ/FAC-Net
Online Refinement of Low-level Feature Based Activation Map for Weakly Supervised Object Localization
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.05741
- 代ē /codeļ¼None
Influence Selection for Active Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09331v1
- 代ē /codeļ¼None
Class Semantics-based Attention for Action Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02613
- 代ē /codeļ¼None
"Knights": First Place Submission for VIPriors21 Action Recognition Challenge at ICCV 2021
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.07758
- 代ē /codeļ¼None https://vipriors.github.io/challenges/#action-recognition
A Baseline Framework for Part-level Action Parsing and Action Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.03368
- 代ē /codeļ¼None
Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.12213
- 代ē /codeļ¼https://github.com/Uason-Chen/CTR-GCN
Elaborative Rehearsal for Zero-shot Action Recognition
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.02833
-
代ē /codeļ¼ https://github.com/DeLightCMU/ElaborativeRehearsal
āļøFineAction: A Fined Video Dataset for Temporal Action Localization
-
č®ŗę/paperļ¼https://arxiv.org/abs/2105.11107 | äø»é”µ/Homepage
-
代ē /codeļ¼ None
āļøMultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
- č®ŗę/paperļ¼https://arxiv.org/abs/2105.07404 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/MCG-NJU/MultiSports/
Spatio-Temporal Dynamic Inference Network for Group Activity Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11743
- 代ē /codeļ¼None
Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.15317
- 代ē /codeļ¼None
Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01305
- 代ē /codeļ¼None
Enriching Local and Global Contexts for Temporal Action Localization
- č®ŗę/paperļ¼https://arxiv.org/abs/2104.02330
- 代ē /codeļ¼None
Boundary-sensitive Pre-training for Temporal Localization in Videos
- č®ŗę/paperļ¼https://arxiv.org/abs/2011.10830
- 代ē /codeļ¼None
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.05382
- 代ē /codeļ¼None
Visual Alignment Constraint for Continuous Sign Language Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2104.02330
- 代ē /codeļ¼ https://github.com/Blueprintf/VAC_CSLR
HandFoldingNet: A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05545
- 代ē /codeļ¼https://github.com/cwc1260/HandFold
Hand-Object Contact Consistency Reasoning for Human Grasps Generation
- č®ŗę/paperļ¼https://arxiv.org/pdf/2104.03304.pdf | äø»é”µ/Homepage
- 代ē /codeļ¼ None
Human Pose Regression with Residual Log-likelihood Estimation Oral
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.11291| äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/Jeff-sjtu/res-loglikelihood-regression
Online Knowledge Distillation for Efficient Pose Estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02092
- 代ē /codeļ¼ None
The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.05132
- 代ē /codeļ¼https://github.com/dvl-tum/center-group
TransPose: Keypoint Localization via Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2012.14214
- 代ē /codeļ¼https://github.com/yangsenius/TransPose
EventHPE: Event-based 3D Human Pose and Shape Estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06819
- 代ē /codeļ¼None
DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencodersļ¼Oralļ¼
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08557
- 代ē /codeļ¼https://github.com/mmlab-cv/DECA
FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06428
- 代ē /codeļ¼None
Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.00990
- 代ē /codeļ¼https://github.com/akashsengupta1997/HierarchicalProbabilistic3DHuman
Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07181
- 代ē /codeļ¼https://github.com/ailingzengzzz/Skeletal-GNN
Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.13788
- 代ē /codeļ¼ https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows
PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.16507 | äø»é”µ/Homepage
- 代ē /codeļ¼ https://github.com/HongwenZhang/PyMAF
Shape-aware Multi-Person Pose Estimation from Multi-View Images
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.02330 | äø»é”µ/Homepage
- 代ē /codeļ¼None
Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.09166
- 代ē /codeļ¼None
RePOSE: Real-Time Iterative Rendering and Refinement for 6D Object Pose Estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2104.00633
- 代ē /codeļ¼https://github.com/sh8/RePOSE
SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08367
- 代ē /codeļ¼None
StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.10115
- 代ē /codeļ¼None
ARCH++: Animation-Ready Clothed Human Reconstruction Revisited
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07845
- 代ē /codeļ¼None
imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10842
- 代ē /codeļ¼None
Learning to Regress Bodies from Images using Differentiable Semantic Rendering
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.03480 | äø»é”µ/Homepage
- 代ē /codeļ¼None
Learning Motion Priors for 4D Human Body Capture in 3D Scenes (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10399 |äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/sanweiliti/LEMO
Physics-based Human Motion Estimation and Synthesis from Videos
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.09913
- 代ē /codeļ¼None
Probabilistic Modeling for Human Mesh Recovery
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11944
- 代ē /codeļ¼https://www.seas.upenn.edu/~nkolot/projects/prohmr/
DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10743 |äø»é”µ/Homepage
- 代ē /codeļ¼None
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation [oral]
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.09881
- 代ē /codeļ¼https://github.com/baegwangbin/surface_normal_uncertainty
Masked Face Recognition Challenge: The InsightFace Track Report
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08191
- 代ē /codeļ¼https://github.com/deepinsight/insightface/tree/master/challenges/iccv21-mfr
Masked Face Recognition Challenge: The WebFace260M Track Report
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07189
- 代ē /codeļ¼None
PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03764
- 代ē /codeļ¼None
Rethinking Common Assumptions to Mitigate Racial Bias in Face Recognition Datasets
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.03229
- 代ē /codeļ¼https://github.com/j-alex-hanson/rethinking-race-face-datasets
SynFace: Face Recognition with Synthetic Data
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07960
- 代ē /codeļ¼None
Unravelling the Effect of Image Distortions for Biased Prediction of Pre-trained Face Recognition Models
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06581
- 代ē /codeļ¼None
ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.05721
- 代ē /codeļ¼None
Talk-to-Edit: Fine-Grained Facial Editing via Dialog
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.04425 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/yumingj/Talk-to-Edit
Self-Supervised 3D Face Reconstruction via Conditional Estimation
-
č®ŗę/paperļ¼https://arxiv.org/abs/2110.04800
-
代ē /codeļ¼None
Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing
-
č®ŗę/paperļ¼https://arxiv.org/abs/2103.15432
-
代ē /codeļ¼None
TransFER: Learning Relation-aware Facial Expression Representations with Transformers
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11116
- 代ē /codeļ¼None
Understanding and Mitigating Annotation Bias in Facial Expression Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08504
- 代ē /codeļ¼None
A Technical Report for ICCV 2021 VIPriors Re-identification Challenge
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.15164
- 代ē /codeļ¼None
ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04533
- 代ē /codeļ¼None
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08728
- 代ē /codeļ¼https://github.com/raoyongming/CAL
IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID Oral
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02413
- 代ē /codeļ¼https://github.com/SikaStar/IDM
Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07422
- 代ē /codeļ¼None
Learning Instance-level Spatial-Temporal Patterns for Person Re-identification
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.00171
-
代ē /codeļ¼https://github.com/RenMin1991/cleaned-DukeMTMC-reID/
Learning Compatible Embeddings
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/IrvingMeng/LCE
Multi-Expert Adversarial Attack Detection in Person Re-identification Using Context Inconsistency
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09891v1
- 代ē /codeļ¼None
Towards Discriminative Representation Learning for Unsupervised Person Re-identification
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03439
- 代ē /codeļ¼None
TransReID: Transformer-based Object Re-Identification
- č®ŗę/paperļ¼https://arxiv.org/abs/2102.04378
- 代ē /codeļ¼https://github.com/heshuting555/TransReID
Video-based Person Re-identification with Spatial and Temporal Memory Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09039
- 代ē /codeļ¼None
Weakly Supervised Person Search with Region Siamese Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.06109
- 代ē /codeļ¼None
Heterogeneous Relational Complement for Vehicle Re-identification
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.07894
- 代ē /codeļ¼None
MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09518v1
- 代ē /codeļ¼None
Spatial and Semantic Consistency Regularizations for Pedestrian Attribute Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.05686
- 代ē /codeļ¼None
Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.12746
- 代ē /codeļ¼https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet
Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.12619
- 代ē /codeļ¼https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet
Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08023
- 代ē /codeļ¼None
Generating Smooth Pose Sequences for Diverse Human Motion Prediction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08422
- 代ē /codeļ¼https://github.com/wei-mao-2019/gsps
MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07152
- 代ē /codeļ¼https://github.com/Droliven/MSRGCN
RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.01316 | äø»é”µ/Homepage
- 代ē /codeļ¼None
Skeleton-Graph: Long-Term 3D Motion Prediction From 2D Observations Using Deep Spatio-Temporal Graph CNNs
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.10257
- 代ē /codeļ¼https://github.com/abduallahmohamed/Skeleton-Graph
DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09640v1
- 代ē /codeļ¼None
MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09274
- 代ē /codeļ¼https://github.com/selflein/MG-GAN
CL-Face-Anti-spoofing
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/xxheyu/CL-Face-Anti-spoofing
3D High-Fidelity Mask Face Presentation Attack Detection Challenge
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06968
- 代ē /codeļ¼None
Exploring Temporal Coherence for More General Video Face Forgery Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06693
- 代ē /codeļ¼None
OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.14480 | Dataset
- 代ē /codeļ¼None
Fake It Till You Make It: Face analysis in the wild using synthetic data alone
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.15102
- 代ē /codeļ¼None
A Hierarchical Assessment of Adversarial Severity
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11785
- 代ē /codeļ¼None
AdvDrop: Adversarial Attack to DNNs by Dropping Information
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09034
- 代ē /codeļ¼None
AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06017
- 代ē /codeļ¼https://github.com/hongw579/AGKD-BML
Optical Adversarial Attack
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06247
- 代ē /codeļ¼None
Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.13797
- 代ē /codeļ¼None
TkML-AP: Adversarial Attacks to Top-k Multi-Label Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.00146
- 代ē /codeļ¼None
Wasserstein Coupled Graph Learning for Cross-Modal Retrieval
- č®ŗę/paperļ¼None
- 代ē /codeļ¼None
AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03824
- 代ē /codeļ¼https://github.com/QT-Zhu/AA-RMVSNet
Augmenting Depth Estimation with Geospatial Context
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.09879
- 代ē /codeļ¼None
Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.12484
- 代ē /codeļ¼None
Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation (oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08829
- 代ē /codeļ¼https://github.com/hyBlue/FSRE-Depth
Motion Basis Learning for Unsupervised Deep Homography Estimationwith Subspace Projection
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/NianjinYe/Motion-Basis-Homography
Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03830
- 代ē /codeļ¼None
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers
- č®ŗę/paperļ¼https://arxiv.org/abs/2011.02910
- 代ē /codeļ¼https://github.com/mli0603/stereo-transformer
Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07628
- 代ē /codeļ¼None
SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01068 | äø»é”µ/Homepage
- 代ē /codeļ¼None
StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08574
- 代ē /codeļ¼https://github.com/SJTU-ViSYS/StructDepth
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.06815
-
代ē /codeļ¼https://github.com/JunHeum/ABME
āļøXVFI: eXtreme Video Frame Interpolation(Oral)
-
č®ŗę/paperļ¼https://arxiv.org/abs/2103.16206
-
代ē /codeļ¼ https://github.com/JihyongOh/XVFI
The Multi-Modal Video Reasoning and Analyzing Competition
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08344
- 代ē /codeļ¼None
CodeNeRF: Disentangled Neural Radiance Fields for Object Categories
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01750 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/wayne1123/code-nerf
GNeRF: GAN-based Neural Radiance Field without Posed Camera
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.15606 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/MQ66/gnerf
In-Place Scene Labelling and Understanding with Implicit Scene Representation (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.15875 | äø»é”µ/Homepage
- 代ē /codeļ¼None
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.13744| äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/creiser/kilonerf
Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01847 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/zju3dv/object_nerf
NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01129 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/weiyithu/NerfingMVS
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
- č®ŗę/paperļ¼https://arxiv.org/abs/2104.00677 | äø»é”µ/Homepage
- 代ē /codeļ¼None
Self-Calibrating Neural Radiance Fields
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.13826
- 代ē /codeļ¼https://github.com/POSTECH-CVLab/SCNeRF
UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction (Oral)
-
č®ŗę/paperļ¼https://arxiv.org/abs/2104.10078 | äø»é”µ/Homepage
-
代ē /codeļ¼None
CANet: A Context-Aware Network for Shadow Removal
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09894v1
- 代ē /codeļ¼None
DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02927
- 代ē /codeļ¼None
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04024
- 代ē /codeļ¼https://github.com/Cuberick-Orion/CIRR
Self-supervised Product Quantization for Deep Unsupervised Image Retrieval
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02244
- 代ē /codeļ¼None
Designing a Practical Degradation Model for Deep Blind Image Super-Resolution
- č®ŗę/paperļ¼https://arxiv.org/pdf/2103.14006.pdf
- 代ē /codeļ¼https://github.com/cszn/BSRGAN
Dual-Camera Super-Resolution with Aligned Attention Modules
-
č®ŗę/paperļ¼https://arxiv.org/abs/2109.01349
-
代ē /codeļ¼None
Generalized Real-World Super-Resolution through Adversarial Robustness
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.11505
-
代ē /codeļ¼None
Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
-
č®ŗę/paperļ¼https://arxiv.org/abs/2004.03791
-
代ē /codeļ¼https://github.com/LongguangWang/ArbSR
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
-
č®ŗę/paperļ¼None
-
代ē /codeļ¼ https://github.com/Anonymous-iccv2021-paper3163/CaFM-Pytorch
Equivariant Imaging: Learning Beyond the Range Space (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.14756
- 代ē /codeļ¼https://github.com/edongdongchen/EI
Spatially-Adaptive Image Restoration using Distortion-Guided Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08617
- 代ē /codeļ¼https://github.com/human-analysis/spatially-adaptive-image-restoration
Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.05655
- 代ē /codeļ¼None
SDWNet: A Straight Dilated Network with Wavelet Transformation for Image Deblurring
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.05803
- 代ē /codeļ¼https://github.com/FlyEgle/SDWNet
Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09108
- 代ē /codeļ¼None
Deep Reparametrization of Multi-Frame Super-Resolution and Denoising (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08286
- 代ē /codeļ¼None
Eformer: Edge Enhancement based Transformer for Medical Image Denoising
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.08044
- 代ē /codeļ¼None
**ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models **Oral
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02938
- 代ē /codeļ¼None
Rethinking Deep Image Prior for Denoising
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12841
- 代ē /codeļ¼None
Rethinking Noise Synthesis and Modeling in Raw Denoising
- ę/paperļ¼https://arxiv.org/abs/2110.04756
- 代ē /codeļ¼None
ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/weitingchen83/ICCV2021-Single-Image-Desnowing-HDCWNet
Gap-closing Matters: Perceptual Quality Assessment and Optimization of Low-Light Image Enhancement
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/Baoliang93/Gap-closing-Matters
Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08697
- 代ē /codeļ¼None
Effect of Parameter Optimization on Classical and Learning-based Image Matching Methods
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08179
- 代ē /codeļ¼None
Viewpoint Invariant Dense Matching for Visual Geolocalization
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.09827
- 代ē /codeļ¼https://github.com/gmberton/geo_warp
MUSIQ: Multi-scale Image Quality Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05997
- 代ē /codeļ¼None
Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.06548
- 代ē /codeļ¼https://github.com/jianzhangcs/SCI3D
Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09551v1
- 代ē /codeļ¼https://github.com/micmic123/QmapCompression
Dynamic Attentive Graph Learning for Image Restoration
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.06620
- 代ē /codeļ¼https://github.com/jianzhangcs/DAGL
Towards Flexible Blind JPEG Artifacts Removal
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.14573
- 代ē /codeļ¼https://github.com/jiaxi-jiang/FBCNN
Image Inpainting via Conditional Texture and Structure Dual Generation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09760v1
- 代ē /codeļ¼https://github.com/Xiefan-Guo/CTSDG
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.01912
- 代ē /codeļ¼None
Internal Video Inpainting by Implicit Long-range Propagation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.01912
- 代ē /codeļ¼None
Occlusion-Aware Video Object Inpainting
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06765
- 代ē /codeļ¼None
Searching for Two-Stream Models in Multivariate Space for Video Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12957
- 代ē /codeļ¼None
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01934
- 代ē /codeļ¼https://github.com/pratyay-banerjee/weak_sup_vqa
Multi-scale Matching Networks for Semantic Correspondence
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.00211
- 代ē /codeļ¼None
āļøCPF: Learning a Contact Potential Field to Model the Hand-object Interaction
- č®ŗę/paperļ¼https://arxiv.org/abs/2012.00924
- 代ē /codeļ¼https://github.com/lixiny/CPF
Exploiting Scene Graphs for Human-Object Interaction Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08584
- 代ē /codeļ¼https://github.com/ht014/SG2HOI
Spatially Conditioned Graphs for Detecting HumanāObject Interactions
- č®ŗę/paperļ¼https://arxiv.org/pdf/2012.06060.pdf
- 代ē /codeļ¼https://github.com/fredzzhang/spatially-conditioned-graphs
Virtual Multi-Modality Self-Supervised Foreground Matting for Human-Object Interaction
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.03278
- 代ē /codeļ¼None
Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation
-
č®ŗę/paperļ¼https://arxiv.org/abs/2107.13780 | äø»é”µ/Homepage
-
代ē /codeļ¼https://github.com/DreamtaleCore/PnP-GA
Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation
-
č®ŗę/paperļ¼https://arxiv.org/abs/2110.06853
-
代ē /codeļ¼None
Improving Contrastive Learning by Visualizing Feature Transformation
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.02982
-
代ē /codeļ¼https://github.com/DTennant/CL-Visualizing-Feature-Transformation
Social NCE: Contrastive Learning of Socially-aware Motion Representations
-
č®ŗę/paperļ¼https://arxiv.org/abs/2012.11717
-
代ē /codeļ¼https://github.com/vita-epfl/social-nce-crowdnav
Parametric Contrastive Learning
-
č®ŗę/paperļ¼https://arxiv.org/abs/2107.12028
-
代ē /codeļ¼https://github.com/jiequancui/Parametric-Contrastive-Learning
MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/Droliven/MSRGCN
GDP: Stabilized Neural Network Pruning via Gates with Differentiable Polarization
-
č®ŗę/paperļ¼https://arxiv.org/abs/2109.02220
-
代ē /codeļ¼None
Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks
-
č®ŗę/paperļ¼https://arxiv.org/abs/2110.09195
-
代ē /codeļ¼https://github.com/yikaiw/SNN
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02100
- 代ē /codeļ¼None
Distance-aware Quantization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06983
- 代ē /codeļ¼None
Dynamic Network Quantization for Efficient Video Inference
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10394
- 代ē /codeļ¼None
Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02720
- 代ē /codeļ¼https://github.com/ZiweiWangTHU/GMPQ
Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.06554
- 代ē /codeļ¼None
Deep Structured Instance Graph for Distilling Object Detectors
-
č®ŗę/paperļ¼https://arxiv.org/abs/2109.12862
-
代ē /codeļ¼https://github.com/dvlab-research/Dsig
Distilling Holistic Knowledge with Graph Neural Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05507
- 代ē /codeļ¼https://github.com/wyc-ruiker/HKD
Lipschitz Continuity Guided Knowledge Distillation
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.12905
-
代ē /codeļ¼None
G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07482
- 代ē /codeļ¼None
Self Supervision to Distillation for Long-Tailed Visual Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.04075
- 代ē /codeļ¼https://github.com/MCG-NJU/SSD-LT
A Robust Loss for Point Cloud Registration
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.11682
-
代ē /codeļ¼None
A Technical Survey and Evaluation of Traditional Point Cloud Clustering Methods for LiDAR Panoptic Segmentation
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.09522v1
-
代ē /codeļ¼None
(Just) A Spoonful of Refinements Helps the Registration Error Go Down Oral
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.03257
-
代ē /codeļ¼None
ABD-Net: Attention Based Decomposition Network for 3D Point Cloud Decomposition
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04221
- 代ē /codeļ¼None
AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05836
- 代ē /codeļ¼None
Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04728
- 代ē /codeļ¼None
CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.00113
- 代ē /codeļ¼None
Deep Models with Fusion Strategies for MVP Point Cloud Registration
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.09129
- 代ē /codeļ¼None
DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04023
- 代ē /codeļ¼None
Guided Point Contrastive Learning for Semi-supervised Point Cloud Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.08188
- 代ē /codeļ¼None
Learning Inner-Group Relations on Point Clouds
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12468
- 代ē /codeļ¼None
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
- č®ŗę/paperļ¼https://arxiv.org/pdf/2103.01128.pdf
- 代ē /codeļ¼https://github.com/CurryYuan/InstanceRefer
ME-PCN: Point Completion Conditioned on Mask Emptiness
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08187
- 代ē /codeļ¼None
MVP Benchmark: Multi-View Partial Point Clouds for Completion and Registration
- č®ŗę/paperļ¼None |äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/paul007pl/MVP_Benchmark
Out-of-Core Surface Reconstruction via Global TGV Minimization
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.14790
- 代ē /codeļ¼None
PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.01269
- 代ē /codeļ¼https://github.com/valeoai/PCAM
PICCOLO: Point Cloud-Centric Omnidirectional Localization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06545
- 代ē /codeļ¼None
Point Cloud Augmentation with Weighted Local Transformations
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.05379
- 代ē /codeļ¼None
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08839
- 代ē /codeļ¼https://github.com/yuxumin/PoinTr
ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.11769
- 代ē /codeļ¼None
Sampling Network Guided Cross-Entropy Method for Unsupervised Point Cloud Registration
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.06619
- 代ē /codeļ¼None
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04444
- 代ē /codeļ¼https://github.com/AllenXiangX/SnowflakeNet
Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.00179
- 代ē /codeļ¼None
Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06317
- 代ē /codeļ¼None
Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03746
- 代ē /codeļ¼https://github.com/chenchao15/2D
Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion
- č®ŗę/paperļ¼https://arxiv.org/abs/2010.01089 |äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/hansen7/OcCo
Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08378
- 代ē /codeļ¼https://github.com/GDAOSU/vis2mesh
Voxel-based Network for Shape Completion by Leveraging Edge Generation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09936v1
- 代ē /codeļ¼https://github.com/xiaogangw/VE-PCN
Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis
-
č®ŗę/paperļ¼https://arxiv.org/abs/2105.01288v1| äø»é”µ/Homepage
-
代ē /codeļ¼https://github.com/tiangexiang/CurveNet
3D Shapes Local Geometry Codes Learning with SDF
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08593
- 代ē /codeļ¼None
3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08653
- 代ē /codeļ¼https://myavartanoo.github.io/3dias/
DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.00033
- 代ē /codeļ¼None
Learning Anchored Unsigned Distance Functions with Gradient Direction Alignment for Single-view Garment Reconstruction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08478
- 代ē /codeļ¼None
Pixel-Perfect Structure-from-Motion with Featuremetric Refinement(Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08291
- 代ē /codeļ¼https://github.com/cvg/pixel-perfect-sfm
VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08623
- 代ē /codeļ¼None
āļøMultiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts
-
č®ŗę/paperļ¼https://arxiv.org/abs/2104.00887
-
代ē /codeļ¼https://github.com/clovaai/mxfont
Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.12664
- 代ē /codeļ¼https://github.com/GXYM/TextBPN
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09661v1
- 代ē /codeļ¼https://github.com/wangyuxin87/VisionLAN
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2107.12090
- 代ē /codeļ¼None
Data Augmentation for Scene Text Recognition
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.06949
-
代ē /codeļ¼https://github.com/roatienza/straug
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
-
č®ŗę/paperļ¼None
-
代ē /codeļ¼https://github.com/wangyuxin87/VisionLAN
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08265
- 代ē /codeļ¼None
FOVEA: Foveated Image Magnification for Autonomous Navigation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12102v1
- 代ē /codeļ¼https://www.cs.cmu.edu/~mengtial/proj/fovea/
Learning to drive from a world on rails
- č®ŗę/paperļ¼https://arxiv.org/abs/2105.00636
- 代ē /codeļ¼https://arxiv.org/abs/2105.00636
MAAD: A Model and Dataset for "Attended Awareness" in Driving
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.08610
- 代ē /codeļ¼https://github.com/ToyotaResearchInstitute/att-aware/
MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12178v1
- 代ē /codeļ¼https://github.com/KaiChen1998/MultiSiam
NEAT: Neural Attention Fields for End-to-End Autonomous Driving
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.04456
- 代ē /codeļ¼None
Road-Challenge-Event-Detection-for-Situation-Awareness-in-Autonomous-Driving
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/Trevorchenmsu/Road-Challenge-Event-Detection-for-Situation-Awareness-in-Autonomous-Driving
Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving
-
č®ŗę/paperļ¼https://arxiv.org/abs/2109.01510
-
代ē /codeļ¼https://github.com/xrenaa/Safety-Aware-Motion-Prediction
ICCV2021_Visdrone_detection
-
č®ŗę/paperļ¼None
-
代ē /codeļ¼https://github.com/Gumpest/ICCV2021_Visdrone_detection
DRĆM -- A discriminatively trained reconstruction embedding for surface anomaly detection
-
č®ŗę/paperļ¼https://arxiv.org/abs/2108.07610
-
代ē /codeļ¼None
Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
-
č®ŗę/paperļ¼https://arxiv.org/pdf/2101.10030.pdf
-
代ē /codeļ¼https://github.com/tianyu0207/RTFM
Cross-Camera Convolutional Color Constancy
-
č®ŗę/paperļ¼https://arxiv.org/abs/2011.11164
-
代ē /codeļ¼https://github.com/mahmoudnafifi/C5
Learnable Boundary Guided Adversarial Training
-
č®ŗę/paperļ¼https://arxiv.org/abs/2011.11164
-
代ē /codeļ¼https://github.com/FPNAS/LBGAT
Prior-Enhanced network with Meta-Prototypes (PEMP)
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/PaperSubmitAAAA/ICCV2021-2337
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
- č®ŗę/paperļ¼https://arxiv.org/abs/2104.12763 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/ashkamath/mdetr
Generalized-Shuffled-Linear-Regression ļ¼Oralļ¼
- č®ŗę/paperļ¼https://drive.google.com/file/d/1Qu21VK5qhCW8WVjiRnnBjehrYVmQrDNh/view
- 代ē /codeļ¼https://github.com/SILI1994/Generalized-Shuffled-Linear-Regression
VLGrammar: Grounded Grammar Induction of Vision and Language
- č®ŗę/paperļ¼https://arxiv.org/abs/2103.12975
- 代ē /codeļ¼https://github.com/evelinehong/VLGrammar
A New Journey from SDRTV to HDRTV
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/chxy95/HDRTVNet
IICNet: A Generic Framework for Reversible Image Conversion
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/felixcheng97/IICNet
Structure-Preserving Deraining with Residue Channel Prior Guidance
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/Joyies/SPDNet
Learning with Noisy Labels via Sparse Regularization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.00192
- 代ē /codeļ¼https://github.com/hitcszx/lnl_sr
Neural Strokes: Stylized Line Drawing of 3D Shapes
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/DifanLiu/NeuralStrokes
COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation
- č®ŗę/paperļ¼None
- 代ē /codeļ¼https://github.com/kywen1119/COOKIE
RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.00616
- 代ē /codeļ¼None
ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.00355
- 代ē /codeļ¼None
Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.00238
- 代ē /codeļ¼None
CanvasVAE: Learning to Generate Vector Graphic Documents
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.01249
- 代ē /codeļ¼None
Refining activation downsampling with SoftPool
- č®ŗę/paperļ¼https://arxiv.org/abs/2101.00440
- 代ē /codeļ¼https://github.com/alexandrosstergiou/SoftPool
Aligning Latent and Image Spaces to Connect the Unconnectable
- č®ŗę/paperļ¼https://arxiv.org/abs/2104.06954 | äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/universome/alis
Unifying Nonlocal Blocks for Neural Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02451
- 代ē /codeļ¼None
SLAMP: Stochastic Latent Appearance and Motion Prediction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.02760
- 代ē /codeļ¼None
TransForensics: Image Forgery Localization with Dense Self-Attention
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03871
- 代ē /codeļ¼None
Learning Facial Representations from the Cycle-consistency of Face
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03427
- 代ē /codeļ¼https://github.com/JiaRenChang/FaceCycle
NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03434
- 代ē /codeļ¼None
Impact of Aliasing on Generalization in Deep Convolutional Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.03489
- 代ē /codeļ¼None
Learning Canonical 3D Object Representation for Fine-Grained Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04628
- 代ē /codeļ¼None
UniNet: A Unified Scene Understanding Network and Exploring Multi-Task Relationships through the Lens of Adversarial Attacks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04584
- 代ē /codeļ¼None
SUNet: Symmetric Undistortion Network for Rolling Shutter Correction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04775
- 代ē /codeļ¼None
Learning to Cut by Watching Movies
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.04294
- 代ē /codeļ¼https://github.com/PardoAlejo/LearningToCut
Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05851
- 代ē /codeļ¼None
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05863 |äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/tgxs002/wikiscenes
Towards Interpretable Deep Metric Learning with Structural Matching
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05889
- 代ē /codeļ¼https://github.com/wl-zhao/DIML
m-RevNet: Deep Reversible Neural Networks with Momentum
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05862
- 代ē /codeļ¼None
DiagViB-6: A Diagnostic Benchmark Suite for Vision Models in the Presence of Shortcut and Generalization Opportunities
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05779
- 代ē /codeļ¼None
perf4sight: A toolflow to model CNN training performance on Edge GPUs
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05580
- 代ē /codeļ¼None
MT-ORL: Multi-Task Occlusion Relationship Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05722
- 代ē /codeļ¼https://github.com/fengpanhe/MT-ORL
ProAI: An Efficient Embedded AI Hardware for Automotive Applications - a Benchmark Study
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.05170
- 代ē /codeļ¼None
SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06180
- 代ē /codeļ¼https://github.com/jiafei1224/SPACE
CODEs: Chamfer Out-of-Distribution Examples against Overconfidence Issue
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06024
- 代ē /codeļ¼None
Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07020
- 代ē /codeļ¼None
Pixel Difference Networks for Efficient Edge Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07009
- 代ē /codeļ¼https://github.com/zhuoinoulu/pidinet
Online Continual Learning For Visual Food Classification
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06781
- 代ē /codeļ¼None
DICOM Imaging Router: An Open Deep Learning Framework for Classification of Body Parts from DICOM X-ray Scans
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06490 |äø»é”µ/Homepage
- 代ē /codeļ¼None
PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07142
- 代ē /codeļ¼https://github.com/sheepooo/PIT-Position-Invariant-Transform
Learning to Automatically Diagnose Multiple Diseases in Pediatric Chest Radiographs Using Deep Convolutional Neural Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06486
- 代ē /codeļ¼None
FaPN: Feature-aligned Pyramid Network for Dense Image Prediction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07058
- 代ē /codeļ¼https://github.com/EMI-Group/FaPN
Finding Representative Interpretations on Convolutional Neural Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.06384
- 代ē /codeļ¼None
Investigating transformers in the decomposition of polygonal shapes as point collections
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07533
- 代ē /codeļ¼None
Self-Supervised Pretraining and Controlled Augmentation Improve Rare Wildlife Recognition in UAV Images
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07582
- 代ē /codeļ¼None
Group-aware Contrastive Regression for Action Quality Assessment
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07797
- 代ē /codeļ¼None
End-to-End Dense Video Captioning with Parallel Decoding
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07781
- 代ē /codeļ¼https://github.com/ttengwang/PDVC
PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07506
- 代ē /codeļ¼None
Scene Designer: a Unified Model for Scene Search and Synthesis from Sketch
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07353
- 代ē /codeļ¼None
Structured Outdoor Architecture Reconstruction by Exploration and Classification
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07990
- 代ē /codeļ¼None
Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08119
- 代ē /codeļ¼https://github.com/cszhilu1998/RAW-to-sRGB
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08202
- 代ē /codeļ¼https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
Deep Hybrid Self-Prior for Full 3D Mesh Generation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08017
- 代ē /codeļ¼None
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07938
- 代ē /codeļ¼None
Thermal Image Processing via Physics-Inspired Deep Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07973
- 代ē /codeļ¼None
A New Journey from SDRTV to HDRTV
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07978
- 代ē /codeļ¼https://github.com/chxy95/HDRTVNet
Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.07884
- 代ē /codeļ¼None
Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08020
- 代ē /codeļ¼None
LOKI: Long Term and Key Intentions for Trajectory Prediction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08236
- 代ē /codeļ¼None
Stochastic Scene-Aware Motion Prediction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08284
- 代ē /codeļ¼https://samp.is.tue.mpg.de/
Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08421
- 代ē /codeļ¼None
Social Fabric: Tubelet Compositions for Video Relation Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08363
- 代ē /codeļ¼https://github.com/shanshuo/Social-Fabric
Causal Attention for Unbiased Visual Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08782
- 代ē /codeļ¼https://github.com/Wangt-CN/CaaM
Universal Cross-Domain Retrieval: Generalizing Across Classes and Domains
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08356
- 代ē /codeļ¼None
Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08487
- 代ē /codeļ¼None
Learning to Match Features with Seeded Graph Matching Network
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08771
- 代ē /codeļ¼https://github.com/vdvchen/SGMNet
A Unified Objective for Novel Class Discovery
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08536
- 代ē /codeļ¼https://github.com/DonkeyShot21/UNO
How to cheat with metrics in single-image HDR reconstruction
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08713
- 代ē /codeļ¼None
Towards Understanding the Generative Capability of Adversarially Robust Classifiers ļ¼Oralļ¼
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09093
- 代ē /codeļ¼None
Airbert: In-domain Pretraining for Vision-and-Language Navigation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09105
- 代ē /codeļ¼None
Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09041
- 代ē /codeļ¼https://github.com/Annbless/OVS_Stabilization
PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08943
- 代ē /codeļ¼None
Continual Learning for Image-Based Camera Localization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09112
- 代ē /codeļ¼None
Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09020
- 代ē /codeļ¼https://github.com/IntelLabs/continuallearning
Detecting and Segmenting Adversarial Graphics Patterns from Images
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09383v1
- 代ē /codeļ¼None
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09980v1
- 代ē /codeļ¼None
BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09376v1
- 代ē /codeļ¼None
Learning Signed Distance Field for Multi-view Surface Reconstruction (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09964v1
- 代ē /codeļ¼None
Deep Relational Metric Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10026v1
- 代ē /codeļ¼https://github.com/zbr17/DRML
Ranking Models in Unlabeled New Environments
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10310v1
- 代ē /codeļ¼https://github.com/sxzrt/Proxy-Set
Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09368v1
- 代ē /codeļ¼None
LSD-StructureNet: Modeling Levels of Structural Detail in 3D Part Hierarchies
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.13459
- 代ē /codeļ¼None
BiaSwap: Removing dataset bias with bias-tailored swapping augmentation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10008v1
- 代ē /codeļ¼None
LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09335v1
- 代ē /codeļ¼None
Learning of Visual Relations: The Devil is in the Tails
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.09668v1
- 代ē /codeļ¼None
Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10843
- 代ē /codeļ¼https://github.com/albert100121/AiFDepthNet
Support-Set Based Cross-Supervision for Video Grounding
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10576
- 代ē /codeļ¼None
Fast Robust Tensor Principal Component Analysis via Fiber CUR Decomposition
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10448
- 代ē /codeļ¼None
Improving Generalization of Batch Whitening by Convolutional Unit Optimization
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.10629
- 代ē /codeļ¼None
CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11305 |äø»é”µ/Homepage
- 代ē /codeļ¼https://github.com/kimren227/CSGStumpNet
NGC: A Unified Framework for Learning with Open-World Noisy Data
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11035
- 代ē /codeļ¼None
LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11950
- 代ē /codeļ¼https://loctex.mit.edu/
The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11550
- 代ē /codeļ¼None
Learning Cross-modal Contrastive Features for Video Domain Adaptation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11974v1
- 代ē /codeļ¼None
Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12278v1
- 代ē /codeļ¼https://github.com/dtuzi123/Lifelong-infinite-mixture-model
A Dual Adversarial Calibration Framework for Automatic Fetal Brain Biometry
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12719
- 代ē /codeļ¼None
LUAI Challenge 2021 on Learning to Understand Aerial Images
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.13246
- 代ē /codeļ¼None
Embedding Novel Views in a Single JPEG Image
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.13003
- 代ē /codeļ¼None
Learning to Discover Reflection Symmetry via Polar Matching Convolution
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.12952
- 代ē /codeļ¼None
Deep 3D Mask Volume for View Synthesis of Dynamic Scenes
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.13408
- 代ē /codeļ¼https://cseweb.ucsd.edu//~viscomp/projects/ICCV21Deep/
Cross-category Video Highlight Detection via Set-based Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.11770
- 代ē /codeļ¼ https://github.com/ChrisAllenMing/Cross_Category_Video_Highlight
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
- č®ŗę/paperļ¼https://arxiv.org/abs/2108.08202
- 代ē /codeļ¼ https://github.com/Anonymous-iccv2021-paper3163/CaFM-Pytorch
Sparse to Dense Motion Transfer for Face Image Animation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.00471
- 代ē /codeļ¼None
SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.00829
- 代ē /codeļ¼None
4D-Net for Learned Multi-Modal Alignment
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01066
- 代ē /codeļ¼None
The Power of Points for Modeling Humans in Clothing
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01137
- 代ē /codeļ¼None
The Functional Correspondence Problem
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01097
- 代ē /codeļ¼None
On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.00524
- 代ē /codeļ¼None
Towards Learning Spatially Discriminative Feature Representations
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01359
- 代ē /codeļ¼None
Learning Fast Sample Re-weighting Without Reward Data
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.03216
- 代ē /codeļ¼https://github.com/google-research/google-research/tree/master/ieg
CTRL-C: Camera calibration TRansformer with Line-Classification
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02259
- 代ē /codeļ¼None
PR-Net: Preference Reasoning for Personalized Video Highlight Detection
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01799
- 代ē /codeļ¼None
Dual Transfer Learning for Event-based End-task Prediction via Pluggable Event to Image Translation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01801
- 代ē /codeļ¼None
Learning to Generate Scene Graph from Natural Language Supervision
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02227
- 代ē /codeļ¼https://github.com/YiwuZhong/SGG_from_NLS
Parsing Table Structures in the Wild
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02199
- 代ē /codeļ¼None
Hierarchical Object-to-Zone Graph for Object Navigation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02066
- 代ē /codeļ¼None
Square Root Marginalization for Sliding-Window Bundle Adjustment
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02182
- 代ē /codeļ¼None
YouRefIt: Embodied Reference Understanding with Language and Gesture
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.03413
- 代ē /codeļ¼None
Deep Hough Voting for Robust Global Registration
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.04310
- 代ē /codeļ¼None
IICNet: A Generic Framework for Reversible Image Conversion
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.04242
- 代ē /codeļ¼https://github.com/felixcheng97/IICNet
Estimating Leaf Water Content using Remotely Sensed Hyperspectral Data
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.02250
- 代ē /codeļ¼None
What Matters for Ad-hoc Video Search? A Large-scale Evaluation on TRECVID
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.01774
- 代ē /codeļ¼None
Shape-Biased Domain Generalization via Shock Graph Embeddings
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.05671
- 代ē /codeļ¼None
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.05743
- 代ē /codeļ¼None
Learning Indoor Inverse Rendering with 3D Spatially-Varying Lightingļ¼Oralļ¼
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.06061
- 代ē /codeļ¼None
Multiresolution Deep Implicit Functions for 3D Shape Representation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.05591
- 代ē /codeļ¼None
Image Shape Manipulation from a Single Augmented Training Sample ļ¼Oralļ¼
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.06151
- 代ē /codeļ¼None
ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.07001
- 代ē /codeļ¼None
Contact-Aware Retargeting of Skinned Motion
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.07431
- 代ē /codeļ¼None
DisUnknown: Distilling Unknown Factors for Disentanglement Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.08090
- 代ē /codeļ¼https://github.com/stormraiser/disunknown
FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.07916
- 代ē /codeļ¼None
A Pathology Deep Learning System Capable of Triage of Melanoma Specimens Utilizing Dermatopathologist Consensus as Ground Truth
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.07554
- 代ē /codeļ¼None
PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.08379
- 代ē /codeļ¼https://github.com/RenYurui/PIRender
The First Vision For Vitals (V4V) Challenge for Non-Contact Video-Based Physiological Estimation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.10471 | Dataset and Challenge
- 代ē /codeļ¼None
FaceEraser: Removing Facial Parts for Augmented Reality
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.10760
- 代ē /codeļ¼None
S3VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.08901
- 代ē /codeļ¼None
JEM++: Improved Techniques for Training JEM
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.09032
- 代ē /codeļ¼https://github.com/sndnyang/JEMPP
Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.11121
- 代ē /codeļ¼https://github.com/WHU-GPCV/SatMVS
Long Short View Feature Decomposition via Contrastive Video Representation Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.11593
- 代ē /codeļ¼None
Visual Scene Graphs for Audio Source Separation
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.11955
- 代ē /codeļ¼None
Meta-Aggregator: Learning to Aggregate for 1-bit Graph Neural Networks
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.12872
- 代ē /codeļ¼None
Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.13499
- 代ē /codeļ¼None
Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.14120
- 代ē /codeļ¼None
Sensor-Guided Optical Flow
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.15321
- 代ē /codeļ¼None
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
- č®ŗę/paperļ¼https://arxiv.org/abs/2109.14910
- 代ē /codeļ¼None
Video Autoencoder: self-supervised disentanglement of static 3D structure and motion
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.02951
- 代ē /codeļ¼None
Topologically Consistent Multi-View Face Inference Using Volumetric Sampling
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.02948
- 代ē /codeļ¼https://tianyeli.github.io/tofu
Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice ļ¼Oralļ¼
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.02750
- 代ē /codeļ¼None
HighlightMe: Detecting Highlights from Human-Centric Videos
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.01774
- 代ē /codeļ¼None
How You Move Your Head Tells What You Do: Self-supervised Video Representation Learning with Egocentric Cameras and IMU Sensors
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.01680
- 代ē /codeļ¼None
Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.01997
- 代ē /codeļ¼https://github.com/ybarancan/STSU
Waypoint Models for Instruction-guided Navigation in Continuous Environments
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.02207
- 代ē /codeļ¼None
Procedure Planning in Instructional Videosvia Contextual Modeling and Model-based Policy Learning ļ¼Oralļ¼
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.01770
- 代ē /codeļ¼None
De-rendering Stylized Texts
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.01890
- 代ē /codeļ¼https://github.com/CyberAgentAILab/derendering-text
Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.01015
- 代ē /codeļ¼None
Keypoint Communities
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.00988
- 代ē /codeļ¼None
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.00519
- 代ē /codeļ¼https://github.com/Lizw14/CaliCO
A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction (Oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.03446
- 代ē /codeļ¼None
2nd Place Solution to Google Landmark Retrieval 2021
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.04294
- 代ē /codeļ¼https://github.com/WesleyZhang1991/Google_Landmark_Retrieval_2021_2nd_Place_Solution
Neural Strokes: Stylized Line Drawing of 3D Shapes
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.03900
- 代ē /codeļ¼https://github.com/DifanLiu/NeuralStrokes
Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D Shape, Pose, and Appearance Consistency
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.05458
- 代ē /codeļ¼None
Pano-AVQA: Grounded Audio-Visual Question Answering on 360ā Videos
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.05122
- 代ē /codeļ¼None
Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.04994
- 代ē /codeļ¼None
BuildingNet: Learning to Label 3D Buildings (oral)
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.04955
- 代ē /codeļ¼None
SOMA: Solving Optical Marker-Based MoCap Automatically
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.04431
- 代ē /codeļ¼None
Topic Scene Graph Generation by Attention Distillation from Caption
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.05731
- 代ē /codeļ¼None
Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer Learning with Visual Concepts
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.06476
- 代ē /codeļ¼None
Understanding of Emotion Perception from Art
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.06486
- 代ē /codeļ¼None
Nuisance-Label Supervision: Robustness Improvement by Free Labels
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.07118
- 代ē /codeļ¼None
Simple Baseline for Single Human Motion Forecasting
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.07495
- 代ē /codeļ¼None
PixelPyramids: Exact Inference Models from Lossless Image Pyramids
- č®ŗę/paperļ¼https://arxiv.org/abs/2110.08787
- 代ē /codeļ¼None