tripathiarpan20
On a venture to explore the boundaries of human creativity & efficiency reachable by AI
tripathiarpan20's Stars
black-forest-labs/flux
Official inference repo for FLUX.1 models
exo-explore/exo
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
fishaudio/fish-speech
Brand new TTS solution
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Picovoice/porcupine
On-device wake word detection powered by deep learning
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Dhravya/cloudflare-saas-stack
Quickly make and deploy full-stack apps with database, auth, styling, storage etc. figured out for you. Add all primitives you want.
KoljaB/RealtimeSTT
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
bghira/SimpleTuner
A general fine-tuning kit geared toward diffusion models.
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
wordware-ai/twitter
AI Agent for Twitter Personality Analysis
ali-vilab/MimicBrush
Official implementations for paper: Zero-shot Image Editing with Reference Imitation
muzishen/IMAGDressing
👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing. It enables customizable human image generation with flexible garment, pose, and scene control, ensuring high fidelity and garment consistency for virtual try-on.
Zheng-Chong/CatVTON
CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
gojasper/flash-diffusion
Official implementation of ⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
apple/ml-mdm
Train high-quality text-to-image diffusion models in a data & compute efficient manner
donahowe/AutoStudio
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
maxin-cn/Cinemo
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
IDEA-Research/TAPTR
[ECCV 2024 & NeurIPS 2024] Official implementation of the paper TAPTR & TAPTRv2 & TAPTRv3
kijai/ComfyUI-LuminaWrapper
czg1225/AsyncDiff
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Yuanshi9815/Video-Infinity
Video-Infinity generates long videos quickly using multiple GPUs without extra training.
hustvl/GaussianDreamerPro
GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality
hwjiang1510/Real3D
Code for "Real3D: Scaling Up Large Reconstruction Models with Real-World Images"
snap-research/weights2weights
Official Implementation of weights2weights
PINTO0309/whisper-onnx-cpu
ONNX implementation of Whisper. PyTorch free.
yandex-research/invertible-cd
[NeurIPS'2024] Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps
GPT-Talker/GPT-Talker
24MM