dxli94's Stars
karpathy/LLM101n
LLM101n: Let's build a Storyteller
linexjlin/GPTs
leaked prompts of GPTs
VikParuchuri/marker
Convert PDF to markdown + JSON quickly with high accuracy
Byaidu/PDFMathTranslate
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker/Zotero
neutraltone/awesome-stock-resources
:city_sunrise: A collection of links for free stock photography, video and Illustration websites
OpenTalker/SadTalker
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
xuebinqin/U-2-Net
The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."
mosaicml/composer
Supercharge Your Model Training
huggingface/safetensors
Simple, safe way to store and distribute tensors
baaivision/EVA
EVA Series: Visual Representation Fantasies from BAAI
huggingface/nanotron
Minimalistic large language model 3D-parallelism training
Yuanshi9815/OminiControl
A minimal and universal controller for FLUX.1.
XueFuzhao/awesome-mixture-of-experts
A collection of AWESOME things about mixture-of-experts
rhymes-ai/Allegro
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
rhymes-ai/Aria
Codebase for Aria - an Open Multimodal Native MoE
haofanwang/ControlNet-for-Diffusers
Transfer the ControlNet with any basemodel in diffusers🔥
kakaobrain/karlo
devilismyfriend/StableTuner
Finetuning SD in style.
LAION-AI/dalle2-laion
Pretrained Dalle2 from laion
salesforce/PyRCA
PyRCA: A Python Machine Learning Library for Root Cause Analysis
Coobiw/MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
Q-Future/Q-Align
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
longvideobench/LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
PathOnAI/LiteMultiAgent
The Library for LLM-based multi-agent applications
kjerk/instructblip-pipeline
A multimodal inference pipeline that integrates InstructBLIP with textgen-webui for Vicuna and related models.
OpenNLPLab/FNAC_AVL
[CVPR 2023] Official implementation of our paper - Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
OpenNLPLab/Vicinity-Vision-Transformer
[TPAMI 2023] This is an official implementation for "Vicinity Vision Transformer".
VideoAutoArena/VideoAutoBench
[CVPR 2025] Official Dataloader and Evaluation Scripts for VideoAutoBench.
avalanchesiqi/pyquantifier
A Python package to estimate class prevalence in unlabeled datasets by specifying stability assumptions
yeoedward/vimrc