okankop's Stars
karpathy/LLM101n
LLM101n: Let's build a Storyteller
myshell-ai/OpenVoice
Instant voice cloning by MIT and MyShell. Audio foundation model.
karpathy/llm.c
LLM training in simple, raw C/CUDA
WongKinYiu/yolov9
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
OpenTalker/video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
facebookresearch/audio2photoreal
Code and dataset for photorealistic Codec Avatars driven from audio
PRIS-CV/DemoFusion
Let us democratise high-resolution generation! (CVPR 2024)
Pointcept/Pointcept
Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)
haoheliu/versatile_audio_super_resolution
Versatile audio super resolution (any -> 48kHz) with AudioSR.
facebookresearch/hiera
Hiera: A fast, powerful, and simple hierarchical vision transformer.
NUS-HPC-AI-Lab/Neural-Network-Parameter-Diffusion
We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
wyf0912/SinSR
[CVPR 2024] SinSR: Diffusion-Based Image Super-Resolution in a Single Step
brentspell/hifi-gan-bwe
Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.
slp-rl/aero
This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)
xiongyihui/tdoa
TDOA based on GCC-PHAT
rishikksh20/HiFiplusplus-pytorch
HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement
JusperLee/SPMamba
idiap/acoustic-simulator
Implementation of audio degradation processes
NXTProduct/TUNet
tum-traffic-dataset/tum-traffic-dataset-dev-kit
TUM Traffic Dataset Development Kit
tteepe/EarlyBird
Official Code for "EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View"
idiap/nnsslm
Neural Network based Sound Source Localization Models
tteepe/TrackTacular
Official Code for "Lifting Multi-View Detection and Tracking to the Bird’s Eye View"
Martlgap/livefaceidapp
Simple Live Face Recognition Streamlit App
Blueblue4/IoU-AwareCalibration
Code to reproduce the experiments described in "Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates and Implicit Duplication Modeling with IoU-Aware Calibration" (https://arxiv.org/pdf/2309.03110.pdf)
teo-sl/Audio-Super-Resolution-ViT
This repository contains the source code for the implementation of two deep learning models concerning the audio super resolution task.