JerExJs

Alibaba DAMO AcademyHangzhou, Zhejiang, China

JerExJs's Stars

karpathy/LLM101n
LLM101n: Let's build a Storyteller
29k 2.2k 01.6k
rasbt/LLMs-from-scratch
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
Language:Jupyter Notebook28k 301 893.2k
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
Language:Python17.7k 111 4661.7k
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python12.1k 99 531848
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Language:Python1.8k 26 46107
datawhalechina/llms-from-scratch-cn
仅需Python基础，从0构建大语言模型；从0逐步构建GLM4\Llama3\RWKV6，深入理解大模型原理
Language:Jupyter Notebook1.2k 17 7171
allenai/mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
Language:Python897 9 1734
bilibili/Index-1.9B
A SOTA lightweight multilingual LLM
Language:Python874 9 2348
mlfoundations/MINT-1T
MINT-1T: A one trillion token multimodal interleaved dataset.
736 24 920
AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Language:Python337 3 2714
patrickjohncyh/fashion-clip
FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
Language:Python326 13 3237
EvolvingLMMs-Lab/LongVA
Long Context Transfer from Language to Vision
Language:Python297 8 2216
frank-xwang/UnSAM
[NeurIPS 2024] Code release for "Segment Anything without Supervision"
Language:Jupyter Notebook297 5 1121
apple/ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
Language:Jupyter Notebook214 14 08
baaivision/EVE
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
Language:Python209 8 163
antoyang/VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
Language:Jupyter Notebook173 3 2119
google-research/composed_image_retrieval
Language:Shell167 4 1917
TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
Language:Python164 8 1914
facebookresearch/DCI
Densely Captioned Images (DCI) dataset repository.
Language:Python155 4 145
yfzhang114/SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Language:Python131 4 87
google-deepmind/magiclens
[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"
Language:Python118 12 58
mutonix/Vript
Language:Python114 1 83
facebookresearch/SemDeDup
Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically similar, but not exactly identical).
Language:Python106 2 911
cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Language:Python98 7 27
deepcs233/Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Language:Python98 1 85
whwu95/FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
Language:Python43 2 70
ChocoWu/SeTok
32 1 20
KupynOrest/instance_augmentation
[ECCV 2024] Official Repo for: Dataset Enhancement with Instance-Level Augmentations
Language:Python28 2 12
tianyu-z/VCR
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
Language:Python22 1 01
ztyang23/BACON
Language:Python11 1 2

JerExJs

JerExJs's Stars

karpathy/LLM101n

rasbt/LLMs-from-scratch

microsoft/graphrag

OpenBMB/MiniCPM-V

facebookresearch/chameleon

datawhalechina/llms-from-scratch-cn

allenai/mmc4

bilibili/Index-1.9B

mlfoundations/MINT-1T

AIDC-AI/Ovis

patrickjohncyh/fashion-clip

EvolvingLMMs-Lab/LongVA

frank-xwang/UnSAM

apple/ml-veclip

baaivision/EVE

antoyang/VidChapters

google-research/composed_image_retrieval

TIGER-AI-Lab/Mantis

facebookresearch/DCI

yfzhang114/SliME

google-deepmind/magiclens

mutonix/Vript

facebookresearch/SemDeDup

cambridgeltl/visual-spatial-reasoning

deepcs233/Visual-CoT

whwu95/FreeVA

ChocoWu/SeTok

KupynOrest/instance_augmentation

tianyu-z/VCR

ztyang23/BACON