Cherishnoobs's Stars
immich-app/immich
High performance self-hosted photo and video management solution.
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
facebookresearch/fastText
Library for fast text representation and classification.
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
CompVis/taming-transformers
Taming Transformers for High-Resolution Image Synthesis
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
modelscope/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
yuweihao/MambaOut
MambaOut: Do We Really Need Mamba for Vision?
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
brucemiller/LaTeXML
LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
christophschuhmann/improved-aesthetic-predictor
CLIP+MLP Aesthetic Score Predictor
lyhue1991/eat_pyspark_in_10_days
pyspark🍒🥭 is delicious,just eat it!😋😋
mlfoundations/MINT-1T
MINT-1T: A one trillion token multimodal interleaved dataset.
beichenzbc/Long-CLIP
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
allenai/pdffigures2
Given a scholarly PDF, extract figures, tables, captions, and section titles.
snap-research/Panda-70M
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
mangye16/Cross-Modal-Re-ID-baseline
Pytorch Code for Cross-Modality Person Re-Identification (Visible Thermal/Infrared Re-ID)
LinWeizheDragon/Retrieval-Augmented-Visual-Question-Answering
This is the official repository for Retrieval Augmented Visual Question Answering
THUDM/CogCoM
SHI-Labs/CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
zjukg/Structure-CLIP
[Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
daooshee/HD-VG-130M
The HD-VG-130M Dataset
zhaohengyuan1/Genixer
(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator
bojone/FSQ
Keras implement of Finite Scalar Quantization
xiaoou2/proxy_pool
ypwang61/negCLIPLoss_NormSim
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning
StanLei52/ViT-Lens-Integration
ViT-Lens Integration to Multimodal Foundation Models