aimerou's Stars
google-research/google-research
Google Research
ultralytics/ultralytics
Ultralytics YOLO11 🚀
roboflow/supervision
We write your reusable computer vision tools. 💜
graphdeco-inria/gaussian-splatting
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Megvii-BaseDetection/YOLOX
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
jaywalnut310/vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
mikel-brostrom/boxmot
BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
aiwaves-cn/agents
An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents
ifzhang/ByteTrack
[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box
dreamgaussian/dreamgaussian
[ICLR 2024 Oral] Generative Gaussian Splatting for Efficient 3D Content Creation
huggingface/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
timsainb/noisereduce
Noise reduction in python using spectral gating (speech, bioacoustics, audio, time-domain signals)
ArjanCodes/betterpython
Code examples for my Write Better Python Code series on YouTube.
chongzhou96/EdgeSAM
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
NirAharon/BoT-SORT
BoT-SORT: Robust Associations Multi-Pedestrian Tracking
Tomiinek/Multilingual_Text_to_Speech
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
dyhBUPT/StrongSORT
[TMM 2023] StrongSORT: Make DeepSORT Great Again
google-deepmind/tracr
ga642381/Speech-Prompts-Adapters
This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.
getalp/ALFFA_PUBLIC
lwang114/UnsupTTS
khanld/Wav2vec2-Pretraining
Wav2vec 2.0 Self-Supervised Pretraining
alirezamshi/small100
Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to EMNLP 2022.
masakhane-io/masakhane-pos
POS for African languages
uds-lsv/afro-maft
neulab/AfricanVoices
Hosts text-to-speech corpus and speech synthesizers for African languages.
Waxal-Multilingual/speech-data
This repository contains multi-modal speech data for African languages that can be used to train ASR and NLP models
Waxal-Multilingual/audio-files
Audio files from waxal data collections