AdityaKulshrestha's Stars
hacksider/Deep-Live-Cam
real time face swap and one-click video deepfake with only a single image
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
SakanaAI/AI-Scientist
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
kyutai-labs/moshi
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
InternLM/MindSearch
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
huggingface/parler-tts
Inference and training library for high-quality TTS models.
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
gpu-mode/lectures
Material for gpu-mode lectures
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
jik876/hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
DAGWorks-Inc/burr
Build applications that make decisions (chatbots, agents, simulations, etc...). Monitor, trace, persist, and execute on your own infrastructure.
facebookresearch/MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
musikalkemist/AudioSignalProcessingForML
Code and slides of my YouTube series called "Audio Signal Proessing for Machine Learning"
sooftware/conformer
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
merveenoyan/smol-vision
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
microsoft/MInference
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
yiranran/APDrawingGAN
Code for APDrawingGAN: Generating Artistic Portrait Drawings from Face Photos with Hierarchical GANs (CVPR 2019 Oral)
X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
NousResearch/Open-Reasoning-Tasks
A comprehensive repository of reasoning tasks for LLMs (and beyond)
IcarusWizard/MAE
PyTorch implementation of Masked Autoencoder
marianne-m/brouhaha-vad
Predicts the level of noise and reverberation on your audiofiles
kyegomez/Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
stephenleo/llm-structured-output-benchmarks
Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on tasks like multi-label classification, named entity recognition, synthetic data generation, etc.
lucasdelimanogueira/PyNorch
Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
Takaaki-Saeki/DiscreteSpeechMetrics
Reference-aware automatic speech evaluation toolkit
thevasudevgupta/gpt-triton
Triton implementation of GPT/LLAMA
lingo-iitgn/ACM-SS-2024-GenAI
Repository for ACM India Summer School on Generative AI for Text