/CVML

a resource repo to find everything about computer vision and machine learning

MIT LicenseMIT

Computer Vision and Machine Learning: All you need to know Awesome

License

Welcome to the Computer Vision and Machine Learning Research Repository! This repository aims to provide a comprehensive list of important topics and the most recent content related to computer vision and deep neural network research. Whether you are a researcher, student, or enthusiast, this repository will serve as a valuable resource for staying up-to-date with the latest advancements in the field.

Table of Contents

Image Classification

⭐⭐⭐ In this section, you will find resources related to image classification, including datasets, models, benchmarking techniques, and recent research papers.

VGG

Very Deep Convolutional Networks for Large-Scale Image Recognition. Karen Simonyan, Andrew Zisserman

GoogleNet

Going Deeper with Convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

PReLU-nets

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

ResNet

Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

PreActResNet

Identity Mappings in Deep Residual Networks Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Inceptionv3

Rethinking the Inception Architecture for Computer Vision Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna

Inceptionv4 && Inception-ResNetv2

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi

RiR

Resnet in Resnet: Generalizing Residual Architectures Sasha Targ, Diogo Almeida, Kevin Lyman

Stochastic Depth ResNet

Deep Networks with Stochastic Depth Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, Kilian Weinberger

WRN

Wide Residual Networks Sergey Zagoruyko, Nikos Komodakis

SqueezeNet

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer

GeNet

Genetic CNN Lingxi Xie, Alan Yuille

MetaQNN

Designing Neural Network Architectures using Reinforcement Learning Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar

PyramidNet

Deep Pyramidal Residual Networks Dongyoon Han, Jiwhan Kim, Junmo Kim

DenseNet

Densely Connected Convolutional Networks Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger

FractalNet

FractalNet: Ultra-Deep Neural Networks without Residuals Gustav Larsson, Michael Maire, Gregory Shakhnarovich

ResNext

Aggregated Residual Transformations for Deep Neural Networks Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He

IGCV1

Interleaved Group Convolutions for Deep Neural Networks Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang

Residual Attention Network

Residual Attention Network for Image Classification Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, Xiaoou Tang

Xception

Xception: Deep Learning with Depthwise Separable Convolutions François Chollet

MobileNet

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam

PolyNet

PolyNet: A Pursuit of Structural Diversity in Very Deep Networks Xingcheng Zhang, Zhizhong Li, Chen Change Loy, Dahua Lin

DPN

Dual Path Networks Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan, Jiashi Feng

Block-QNN

Practical Block-wise Neural Network Architecture Generation Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu

CRU-Net

Sharing Residual Units Through Collective Tensor Factorization in Deep Neural Networks Chen Yunpeng, Jin Xiaojie, Kang Bingyi, Feng Jiashi, Yan Shuicheng

DLA

Deep Layer Aggregation Fisher Yu, Dequan Wang, Evan Shelhamer, Trevor Darrell

ShuffleNet

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun

CondenseNet

CondenseNet: An Efficient DenseNet using Learned Group Convolutions Gao Huang, Shichen Liu, Laurens van der Maaten, Kilian Q. Weinberger

NasNet

Learning Transferable Architectures for Scalable Image Recognition Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le

MobileNetV2

MobileNetV2: Inverted Residuals and Linear Bottlenecks Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen

IGCV2

IGCV2: Interleaved Structured Sparse Convolutional Neural Networks Guotian Xie, Jingdong Wang, Ting Zhang, Jianhuang Lai, Richang Hong, Guo-Jun Qi

hier

Hierarchical Representations for Efficient Architecture Search Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, Koray Kavukcuoglu

PNasNet

Progressive Neural Architecture Search Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy

AmoebaNet

Regularized Evolution for Image Classifier Architecture Search Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V Le

SENet

Squeeze-and-Excitation Networks Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Enhua Wu

ShuffleNetV2

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun

CBAM

CBAM: Convolutional Block Attention Module Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon

IGCV3

IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks Ke Sun, Mingjie Li, Dong Liu, Jingdong Wang

BAM

BAM: Bottleneck Attention Module Jongchan Park, Sanghyun Woo, Joon-Young Lee, In So Kweon

MNasNet

MnasNet: Platform-Aware Neural Architecture Search for Mobile Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Quoc V. Le

SKNet

Selective Kernel Networks Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang

DARTS

DARTS: Differentiable Architecture Search Hanxiao Liu, Karen Simonyan, Yiming Yang

ProxylessNAS

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware Han Cai, Ligeng Zhu, Song Han

MobileNetV3

Searching for MobileNetV3 Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam

Res2Net

Res2Net: A New Multi-scale Backbone Architecture Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, Philip Torr

LIP-ResNet

LIP: Local Importance-based Pooling Ziteng Gao, Limin Wang, Gangshan Wu

EfficientNet

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Mingxing Tan, Quoc V. Le

FixResNeXt

Fixing the train-test resolution discrepancy Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

BiT

Big Transfer (BiT): General Visual Representation Learning Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby

PSConv + ResNext101

PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer Duo Li1, Anbang Yao2B, and Qifeng Chen1B

NoisyStudent

Self-training with Noisy Student improves ImageNet classification Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le

RegNet

Designing Network Design Spaces Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár

GhostNet

GhostNet: More Features from Cheap Operations Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, Chang Xu

ViT

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

DeiT

Training data-efficient image transformers & distillation through attention Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou

PVT

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

T2T

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, Shuicheng Yan

DeepVit

DeepViT: Towards Deeper Vision Transformer Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, and Jiashi Feng.

ViL

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, Jianfeng Gao

TNT

Transformer in Transformer Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, Yunhe Wang

CvT

CvT: Introducing Convolutions to Vision Transformers Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang

CViT

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification Chun-Fu (Richard) Chen, Quanfu Fan, Rameswar Panda

Focal-T

Focal Attention for Long-Range Interactions in Vision Transformers Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

Twins

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

PVTv2

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

Object Detection

⭐⭐⭐ This section focuses on object detection, covering topics such as algorithms, frameworks, datasets, and state-of-the-art models developed for accurate and efficient object detection.

Imbalance Problems in Object Detection: A Review

Recent Advances in Deep Learning for Object Detection

A Survey of Deep Learning-based Object Detection

Object Detection in 20 Years: A Survey

《Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks》

《Deep Learning for Generic Object Detection: A Survey》

R-CNN

Rich feature hierarchies for accurate object detection and semantic segmentation

Fast R-CNN

Fast R-CNN

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

Faster R-CNN

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

R-CNN minus R

Faster R-CNN in MXNet with distributed implementation and data parallelization

Contextual Priming and Feedback for Faster R-CNN

An Implementation of Faster RCNN with Study for Region Sampling

Interpretable R-CNN

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Mask R-CNN

Light-Head R-CNN

Light-Head R-CNN: In Defense of Two-Stage Object Detector

Cascade R-CNN

Cascade R-CNN: Delving into High Quality Object Detection

SPP-Net

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection

Object Detectors Emerge in Deep Scene CNNs

segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection

Object Detection Networks on Convolutional Feature Maps

Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction

DeepBox: Learning Objectness with Convolutional Networks

YOLO

You Only Look Once: Unified, Real-Time Object Detection

img

darkflow - translate darknet to tensorflow. Load trained weights, retrain/fine-tune them using tensorflow, export constant graph def to C++

Start Training YOLO with Our Own Data

img

YOLO: Core ML versus MPSNNGraph

TensorFlow YOLO object detection on Android

Computer Vision in iOS – Object Detection

YOLOv2

YOLO9000: Better, Faster, Stronger

darknet_scripts

Yolo_mark: GUI for marking bounded boxes of objects in images for training Yolo v2

LightNet: Bringing pjreddie's DarkNet out of the shadows

https://github.com//explosion/lightnet

YOLO v2 Bounding Box Tool

Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

  • intro: LRM is the first hard example mining strategy which could fit YOLOv2 perfectly and make it better applied in series of real scenarios where both real-time rates and accurate detection are strongly demanded.
  • arxiv: https://arxiv.org/abs/1804.04606

Object detection at 200 Frames Per Second

Event-based Convolutional Networks for Object Detection in Neuromorphic Cameras

OmniDetector: With Neural Networks to Bounding Boxes

YOLOv3

YOLOv3: An Incremental Improvement

YOLT

You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery

SSD

SSD: Single Shot MultiBox Detector

img

What's the diffience in performance between this new code you pushed and the previous code? #327

weiliu89/caffe#327

DSSD

DSSD : Deconvolutional Single Shot Detector

Enhancement of SSD by concatenating feature maps for object detection

Context-aware Single-Shot Detector

Feature-Fused SSD: Fast Detection for Small Objects

https://arxiv.org/abs/1709.05054

FSSD

FSSD: Feature Fusion Single Shot Multibox Detector

https://arxiv.org/abs/1712.00960

Weaving Multi-scale Context for Single Shot Detector

ESSD

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

https://arxiv.org/abs/1801.05918

Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection

https://arxiv.org/abs/1802.06488

MDSSD

MDSSD: Multi-scale Deconvolutional Single Shot Detector for small objects

Pelee

Pelee: A Real-Time Object Detection System on Mobile Devices

https://github.com/Robert-JunWang/Pelee

Fire SSD

Fire SSD: Wide Fire Modules based Single Shot Detector on Edge Device

R-FCN

R-FCN: Object Detection via Region-based Fully Convolutional Networks

R-FCN-3000 at 30fps: Decoupling Detection and Classification

https://arxiv.org/abs/1712.01802

Recycle deep features for better object detection

FPN

Feature Pyramid Networks for Object Detection

Action-Driven Object Detection with Top-Down Visual Attentions

Beyond Skip Connections: Top-Down Modulation for Object Detection

Wide-Residual-Inception Networks for Real-time Object Detection

Attentional Network for Visual Object Detection

Learning Chained Deep Features and Classifiers for Cascade in Object Detection

DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Spatial Memory for Context Reasoning in Object Detection

Accurate Single Stage Detector Using Recurrent Rolling Convolution

Deep Occlusion Reasoning for Multi-Camera Multi-Target Detection

https://arxiv.org/abs/1704.05775

LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems

Point Linking Network for Object Detection

Perceptual Generative Adversarial Networks for Small Object Detection

https://arxiv.org/abs/1706.05274

Few-shot Object Detection

https://arxiv.org/abs/1706.08249

Yes-Net: An effective Detector Based on Global Information

https://arxiv.org/abs/1706.09180

SMC Faster R-CNN: Toward a scene-specialized multi-object detector

https://arxiv.org/abs/1706.10217

Towards lightweight convolutional neural networks for object detection

https://arxiv.org/abs/1707.01395

RON: Reverse Connection with Objectness Prior Networks for Object Detection

Mimicking Very Efficient Network for Object Detection

Residual Features and Unified Prediction Network for Single Stage Detection

https://arxiv.org/abs/1707.05031

Deformable Part-based Fully Convolutional Network for Object Detection

Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors

Recurrent Scale Approximation for Object Detection in CNN

DSOD

DSOD: Learning Deeply Supervised Object Detectors from Scratch

img

Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids

Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages

Object Detection from Scratch with Deep Supervision

RetinaNet

Focal Loss for Dense Object Detection

CoupleNet: Coupling Global Structure with Local Parts for Object Detection

Incremental Learning of Object Detectors without Catastrophic Forgetting

Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection

https://arxiv.org/abs/1709.04347

StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection

https://arxiv.org/abs/1709.05788

Dynamic Zoom-in Network for Fast Object Detection in Large Images

https://arxiv.org/abs/1711.05187

Zero-Annotation Object Detection with Web Knowledge Transfer

MegDet

MegDet: A Large Mini-Batch Object Detector

Receptive Field Block Net for Accurate and Fast Object Detection

An Analysis of Scale Invariance in Object Detection - SNIP

Feature Selective Networks for Object Detection

https://arxiv.org/abs/1711.08879

Learning a Rotation Invariant Detector with Rotatable Bounding Box

Scalable Object Detection for Stylized Objects

Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids

Deep Regionlets for Object Detection

Training and Testing Object Detectors with Virtual Images

Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video

  • keywords: object mining, object tracking, unsupervised object discovery by appearance-based clustering, self-supervised detector adaptation
  • arxiv: https://arxiv.org/abs/1712.08832

Spot the Difference by Object Detection

Localization-Aware Active Learning for Object Detection

Object Detection with Mask-based Feature Encoding

LSTD: A Low-Shot Transfer Detector for Object Detection

Pseudo Mask Augmented Object Detection

https://arxiv.org/abs/1803.05858

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

https://arxiv.org/abs/1803.06799

Learning Region Features for Object Detection

Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection

Object Detection for Comics using Manga109 Annotations

Task-Driven Super Resolution: Object Detection in Low-resolution Images

Transferring Common-Sense Knowledge for Object Detection

Multi-scale Location-aware Kernel Representation for Object Detection

Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

Robust Physical Adversarial Attack on Faster R-CNN Object Detector

RefineNet

Single-Shot Refinement Neural Network for Object Detection

DetNet

DetNet: A Backbone network for Object Detection

SSOD

Self-supervisory Signals for Object Discovery and Detection

CornerNet

CornerNet: Detecting Objects as Paired Keypoints

M2Det

M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network

3D Object Detection

3D Backbone Network for 3D Object Detection

LMNet: Real-time Multiclass Object Detection on CPU using 3D LiDARs

ZSD(Zero-Shot Object Detection)

Zero-Shot Detection

Zero-Shot Object Detection

Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts

Zero-Shot Object Detection by Hybrid Region Embedding

OSD(One-Shot Object Detection)

Comparison Network for One-Shot Conditional Object Detection

One-Shot Object Detection

RepMet: Representative-based metric learning for classification and one-shot object detection

Weakly Supervised Object Detection

Weakly Supervised Object Detection in Artworks

Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation

Softer-NMS

《Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection》

Feature Selective Anchor-Free Module for Single-Shot Object Detection

Object Detection based on Region Decomposition and Assembly

Bottom-up Object Detection by Grouping Extreme and Center Points

ORSIm Detector: A Novel Object Detection Framework in Optical Remote Sensing Imagery Using Spatial-Frequency Channel Features

Consistent Optimization for Single-Shot Object Detection

Learning Pairwise Relationship for Multi-object Detection in Crowded Scenes

RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free

Region Proposal by Guided Anchoring

Scale-Aware Trident Networks for Object Detection

Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions

Strong-Weak Distribution Alignment for Adaptive Object Detection

AutoFocus: Efficient Multi-Scale Inference

  • intro: AutoFocus obtains an mAP of 47.9% (68.3% at 50% overlap) on the COCO test-dev set while processing 6.4 images per second on a Titan X (Pascal) GPU
  • arXiv: https://arxiv.org/abs/1812.01600

NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection

SPLAT: Semantic Pixel-Level Adaptation Transforms for Detection

Grid R-CNN

Deformable ConvNets v2: More Deformable, Better Results

Anchor Box Optimization for Object Detection

Efficient Coarse-to-Fine Non-Local Module for the Detection of Small Objects

NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection

Learning RoI Transformer for Detecting Oriented Objects in Aerial Images

Integrated Object Detection and Tracking with Tracklet-Conditioned Detection

Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection

Gradient Harmonized Single-stage Detector

CFENet: Object Detection with Comprehensive Feature Enhancement Module

DeRPN: Taking a further step toward more general object detection

Hybrid Knowledge Routed Modules for Large-scale Object Detection

《Receptive Field Block Net for Accurate and Fast Object Detection》

Deep Feature Pyramid Reconfiguration for Object Detection

Unsupervised Hard Example Mining from Videos for Improved Object Detection

Acquisition of Localization Confidence for Accurate Object Detection

Toward Scale-Invariance and Position-Sensitive Region Proposal Networks

MetaAnchor: Learning to Detect Objects with Customized Anchors

Relation Network for Object Detection

Quantization Mimic: Towards Very Tiny CNN for Object Detection

Learning Rich Features for Image Manipulation Detection

SNIPER: Efficient Multi-Scale Training

Soft Sampling for Robust Object Detection

Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria

Other

R3-Net: A Deep Network for Multi-oriented Vehicle Detection in Aerial Images and Videos

Detection Toolbox

  • Detectron(FAIR): Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including Mask R-CNN. It is written in Python and powered by the Caffe2 deep learning framework.
  • Detectron2: Detectron2 is FAIR's next-generation research platform for object detection and segmentation.
  • maskrcnn-benchmark(FAIR): Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
  • mmdetection(SenseTime&CUHK): mmdetection is an open source object detection toolbox based on PyTorch. It is a part of the open-mmlab project developed by Multimedia Laboratory, CUHK.

Image Segmentation

⭐⭐⭐ Explore the field of image segmentation, including popular segmentation algorithms, datasets, and cutting-edge approaches for segmenting objects within images.

Action Classification

⭐⭐⭐ Discover resources related to action classification, which involves recognizing and classifying human actions in videos. This section includes datasets, models, and techniques specific to action classification tasks.

Video Recognition

⭐⭐⭐ In this section, you will find information on video recognition, which involves understanding and analyzing videos to recognize and interpret various visual elements. Explore datasets, models, and techniques for video understanding tasks.

Deep CNN Models

⭐⭐⭐ This section is dedicated to deep convolutional neural network (CNN) models, which have revolutionized computer vision research. Discover well-known CNN architectures, pre-trained models, and research papers showcasing the advancements in deep learning for computer vision.

Transformers

⭐⭐⭐ For an ultimate and most recent recourses on Transformers just check out the Ultimate-Awesome-Transformer-Attention.

3D Vision Models

⭐⭐⭐ Explore the fascinating world of 3D vision models. This section covers topics such as 3D object recognition, depth estimation, point cloud processing, and related research papers.

CVML Courses

⭐⭐⭐ Here is the list of nice computer vision and machine learning courses with online acceable materials.

Computer Vision

Machine Learning and Statistical Learning

Optimization

Additional Resources

⭐⭐⭐ For an extensive list of computer vision resources, you can also check out the awesome-computer-vision repository by Jianbo Shi. It contains a curated collection of various computer vision topics, including datasets, software, tutorials, and research papers.

Contributing

⭐⭐⭐ We welcome contributions from the community to make this repository even more comprehensive and up-to-date. If you have any suggestions, please feel free to open an issue or submit a pull request.

License

This repository is licensed under the MIT License.