zideliu's Stars
microsoft/generative-ai-for-beginners
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
mem0ai/mem0
The Memory layer for your AI apps
KwaiVGI/LivePortrait
Bring portraits to life!
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
camel-ai/camel
🐫 CAMEL: Finding the Scaling Law of Agents. The first and the best multi-agent framework. https://www.camel-ai.org
VueTorrent/VueTorrent
The sleekest looking WEBUI for qBittorrent made with Vuejs!
Kwai-Kolors/Kolors
Kolors Team
poloclub/transformer-explainer
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
XLabs-AI/x-flux
TencentQQGYLab/ELLA
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
test-time-training/ttt-lm-pytorch
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
RQLuo/MixTeX-Latex-OCR
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
OSU-NLP-Group/Mind2Web
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web"
buoyancy99/diffusion-forcing
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
Alpha-VLLM/Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
Bujiazi/MotionClone
Official implementation of MotionClone: Training-Free Motion Cloning for Controllable Video Generation
shikiw/OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
showlab/Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
aim-uofa/MovieDreamer
linzhiqiu/t2v_metrics
Evaluating text-to-image/video/3D models with VQAScore
yk7333/d3po
[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
aim-uofa/AutoStory
tobias-kirschstein/diffusion-avatars
Yushi-Hu/tifa
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
HansenHuang0823/PlacidDreamer
The official implementation of ACM Multimedia 2024 paper "PlacidDreamer: Advancing Harmony in Text-to-3D Generation".
aim-uofa/FreeCustom
[CVPR 2024] Official PyTorch implementation of FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
aim-uofa/FreeCompose
WebVLN/WebVLN
Official implementation of WebVLN: Vision-and-Language Navigation on Websites
zrealli/TIGIC
[ECCV 2024] Tuning-Free Image Customization with Image and Text Guidance
cuixing100876/InstaStyle