hayk-corpusant's Stars
Lightning-AI/pytorch-lightning
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
apple/coremltools
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
luosiallen/latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
espeak-ng/espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
PixArt-alpha/PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Stability-AI/stable-audio-tools
Generative models for conditional audio generation
nerfstudio-project/gsplat
CUDA accelerated rasterization of gaussian splatting
pytorch/executorch
On-device AI across mobile, embedded and edge for PyTorch
symforce-org/symforce
Fast symbolic computation, code generation, and nonlinear optimization for robotics
gnobitab/InstaFlow
:zap: InstaFlow! One-Step Stable Diffusion with Rectified Flow (ICLR 2024)
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
horseee/DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
DoMusic/Hybrid-Net
Real-time audio to chords, lyrics, beat, and melody.
Audio-AGI/WavJourney
WavJourney: Compositional Audio Creation with LLMs
shansongliu/M2UGen
This is the official repository for M2UGen
mir-aidj/all-in-one
All-In-One Music Structure Analyzer
maxrmorrison/torchcrepe
Pytorch implementation of the CREPE pitch tracker
diffusion-classifier/diffusion-classifier
Diffusion Classifier leverages pretrained diffusion models to perform zero-shot classification without additional training
bytedance/uss
This is the PyTorch implementation of the Universal Source Separation with Weakly labelled Data.
YuanGongND/whisper-at
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
VinAIResearch/XPhoneBERT
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
spotify-research/llark
Code for the paper "LLark: A Multimodal Instruction-Following Language Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.
google-ai-edge/ai-edge-torch
Supporting PyTorch models with the Google AI Edge TFLite runtime.
seungheondoh/lp-music-caps
LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]
descriptinc/audiotools
Object-oriented handling of audio data, with GPU-powered augmentations, and more.
locuslab/ect
Consistency Models Made Easy
tchambon/IADB
Official implementation of IADB (Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model), published at Siggraph 2023.
microsoft/fadtk
A simple library for Fréchet Audio Distance (FAD) calculation
ryeoat3/gomin
GOMIN; Gaudio Open Mel-spectrogram Inversion Network