eisneim's Stars
thunlp/Migician
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
chrischoy/WhisperChain
Speech to Text but with all the bells and whistles and most importantly AI! AI will clean up your filler words, edit and will refine what you said!
chuanruihu/Level-Navi-Agent-Search
The Level-Navi Agent, a framework that requires no training and utilizes large language models for deep query understanding and precise search operations. The repo includes benchmarks, datasets, and tools for assessing LLM performance in web searches
SparkAudio/Spark-TTS
Spark-TTS Inference Code
KohakuBlueleaf/PixelOE
Detail-Oriented Pixelization based on Contrast-Aware Outline Expansion.
zilliztech/deep-searcher
Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.
thu-pacman/chitu
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
EvolvingLMMs-Lab/EgoLife
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
SesameAILabs/csm
A Conversational Speech Generation Model
dvruette/gidd
Code accompanying the paper "Generalized Interpolating Discrete Diffusion"
gojasper/LBM
LBM: Latent Bridge Matching for Fast Image-to-Image Translation ✨
kuleshov-group/bd3lms
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
svg-project/Sparse-VideoGen
ButzYung/SystemAnimatorOnline
XR Animator, AI-based Full Body Motion Capture and Extended Reality (XR) solution, powered by System Animator Online
TrajectoryCrafter/TrajectoryCrafter
Official implementation of TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
TencentARC/VideoPainter
Any-length Video Inpainting and Editing with Plug-and-Play Context Control
kohya-ss/musubi-tuner
TTPlanetPig/Gui_for_musubi-tuner
A GUI for Kohya_ss musubi-tuner for easy use!
AssafSinger94/dino-tracker
Official Pytorch Implementation for “DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video” (ECCV 2024)
LTH14/fractalgen
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
nv-tlabs/GEN3C
[CVPR 2025] GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
YisuiTT/Mobius
Mobius: Text to Seamless Looping Video Generation via Latent Shift
lumalabs/imm
Official implementation of Inductive Moment Matching
stepfun-ai/Step-Audio
DigiRL-agent/digiq
showlab/PhotoDoodle
Code Implementation of "PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data"
huggingface/movie-shot-categorizer
Fine-tune of Florence-2 for shot categorization.
ML-GSAI/LLaDA
Official PyTorch implementation for "Large Language Diffusion Models"
FoundationVision/UniTok
A Unified Tokenizer for Visual Generation and Understanding
IHe-KaiI/CTRL-D
CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion.