seruva19's Stars
omidsakhi/lorakit
A simple SDXL fine-tuning toolkit based on the DreamBooth branch of AutoTrain Advanced from 🤗, inspired by the way ai-toolkit approaches configuration.
unclecode/crawl4ai
🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper
genforce/ctrl-x
Official implementation of "Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance" (NeurIPS 2024)
mindee/doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
instantX-research/InstantStyle
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥
facefusion/facefusion
Industry leading face manipulation platform
NirDiamant/GenAI_Agents
This repository provides tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. It serves as a comprehensive guide for building intelligent, interactive AI systems.
facebookresearch/AudioMAE
This repo hosts the code and models of "Masked Autoencoders that Listen".
ivcylc/qa-mdt
SOTA Text-to-music (TTM) Generation (OpenMusic)
NexaAI/nexa-sdk
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
QiuYannnn/Local-File-Organizer
An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.
zml/zml
High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild
av/klmbr
klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs
kyutai-labs/moshi
dvlab-research/ControlNeXt
Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
VectorSpaceLab/OmniGen
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
aigc-apps/CogVideoX-Fun
📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
JusperLee/Apollo
Music repair method to convert lossy MP3 compressed music to lossless music.
minzwon/sota-music-tagging-models
ALEEEHU/Awesome-Text2X-Resources
This is an open collection of state-of-the-art (SOTA), novel Text to X (X can be everything) methods (papers, codes and datasets).
pinokiofactory/cogstudio
voideditor/void
kousw/experimental-consistory
turbo-llm/turbo-alignment
Library for industrial alignment.
Vchitect/Vchitect-2.0
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
microsoft/Graphormer
Graphormer is a general-purpose deep learning backbone for molecular modeling.
Tencent/DepthCrafter
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
OutofAi/OutofFocus
An AI focused photo manipulation tool based on Gradio
aredden/flux-fp8-api
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.