JerExJs's Stars
karpathy/LLM101n
LLM101n: Let's build a Storyteller
rasbt/LLMs-from-scratch
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
datawhalechina/llms-from-scratch-cn
仅需Python基础,从0构建大语言模型;从0逐步构建GLM4\Llama3\RWKV6, 深入理解大模型原理
allenai/mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
bilibili/Index-1.9B
A SOTA lightweight multilingual LLM
mlfoundations/MINT-1T
MINT-1T: A one trillion token multimodal interleaved dataset.
AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
patrickjohncyh/fashion-clip
FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
EvolvingLMMs-Lab/LongVA
Long Context Transfer from Language to Vision
frank-xwang/UnSAM
[NeurIPS 2024] Code release for "Segment Anything without Supervision"
apple/ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
baaivision/EVE
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
antoyang/VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
google-research/composed_image_retrieval
TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
facebookresearch/DCI
Densely Captioned Images (DCI) dataset repository.
yfzhang114/SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
google-deepmind/magiclens
[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"
mutonix/Vript
facebookresearch/SemDeDup
Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically similar, but not exactly identical).
cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
deepcs233/Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
whwu95/FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
ChocoWu/SeTok
KupynOrest/instance_augmentation
[ECCV 2024] Official Repo for: Dataset Enhancement with Instance-Level Augmentations
tianyu-z/VCR
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
ztyang23/BACON