SLTK1's Stars
2noise/ChatTTS
A generative speech model for daily dialogue.
black-forest-labs/flux
Official inference repo for FLUX.1 models
optuna/optuna
A hyperparameter optimization framework
NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
moskomule/senet.pytorch
PyTorch implementation of SENet
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
yuweihao/MambaOut
MambaOut: Do We Really Need Mamba for Vision?
facebookresearch/Detic
Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".
QiuChenly/InjectLib
你知道我要说什么
IDEA-Research/Grounded-SAM-2
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
jianzongwu/Awesome-Open-Vocabulary
(TPAMI 2024) A Survey on Open Vocabulary Learning
lucidrains/transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
youngwanLEE/centermask2
[CVPR 2020] CenterMask : Real-time Anchor-Free Instance Segmentation
thu-coai/CrossWOZ
A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset
AIGText/Glyph-ByT5
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""
HOST-Oman/libraqm
A library for complex text layout
showlab/videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
baaivision/EVE
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
OFA-Sys/InsTag
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
MaxKinny222/TabRecSet
A large scale camera-taken table detection and recognition dataset.
LDLINGLINGLING/MiniCPM_Series_Tutorial
Minicpm和MiniCPM-V的项目和教程。包括推理,量化,边端部署,微调,技术报告、应用六个主题
princeton-nlp/CharXiv
[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
nttmdlab-nlp/SlideVQA
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
JianshuZhang/TAP
Track, Attend and Parse for Online Handwritten Mathematical Expression Recognition
LayTextLLM/LayTextLLM
TianheWu/CoSeR
An unofficial implementation for "CoSeR: Bridging Image and Language for Cognitive Super-Resolution (CVPR 2024)"