adamtwig's Stars
huggingface/pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
open-mmlab/mmdetection
OpenMMLab Detection Toolbox and Benchmark
ultralytics/ultralytics
NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
lucidrains/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
fchollet/deep-learning-with-python-notebooks
Jupyter notebooks for the code samples of the book "Deep Learning with Python"
pytorch/vision
Datasets, Transforms and Models specific to Computer Vision
kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
openalpr/openalpr
Automatic License Plate Recognition library
NVIDIA/tacotron2
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
UX-Decoder/Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
facebookresearch/deit
Official DeiT repository
tusen-ai/simpledet
A Simple and Versatile Framework for Object Detection and Instance Recognition
clovaai/CRAFT-pytorch
Official implementation of Character Region Awareness for Text Detection (CRAFT)
jik876/hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
chrisdonahue/wavegan
WaveGAN: Learn to synthesize raw audio with generative adversarial networks
qiuqiangkong/audioset_tagging_cnn
YuanGongND/ast
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
kahst/BirdNET-Analyzer
BirdNET analyzer for scientific audio data processing.
hasanirtiza/Pedestron
[Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021
krantiparida/awesome-audio-visual
A curated list of different papers and datasets in various areas of audio-visual processing
baudm/parseq
Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)
qiuqiangkong/torchlibrosa
WenjiaWang0312/TextZoom
[ECCV2020] A super-resolution dataset of paired LR-HR scene text images
FangShancheng/ABINet
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
YuanGongND/ssast
Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
mitmul/caltech-pedestrian-dataset-converter
Download Caltech Pedestrian Dataset and convert them for Python users without using MATLAB
rishikksh20/Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
jdfxzzy/DPMN
Improving Scene Text Image Super-Resolution via Dual Prior Modulation Network (AAAI 2023)
Donghwa-KIM/audiotext-transformer
cross-modal model between audio(MFCC) and text(KoBERT)
stefan-sf-wu/FlagDetSeg
2021 AVSS