Cherishnoobs

research is research

Cherishnoobs's Stars

immich-app/immich
High performance self-hosted photo and video management solution.
Language:TypeScript52k 230 4.1k2.8k
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Language:Python34.4k 213 5.3k4.2k
facebookresearch/fastText
Library for fast text representation and classification.
Language:HTML25.9k 845 1.1k4.7k
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
Language:Jupyter Notebook13.7k 96 171.1k
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
Language:Python9k 63 213568
CompVis/taming-transformers
Taming Transformers for High-Resolution Image Synthesis
Language:Jupyter Notebook5.8k 75 2201.1k
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Language:Python5.3k 34 564434
modelscope/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！
Language:Python2.9k 19 195177
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
Language:Python2.1k 29 170145
yuweihao/MambaOut
MambaOut: Do We Really Need Mamba for Vision?
Language:Python2k 8 24434
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Language:Python2k 27 129159
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Language:Python1.5k 17 92108
brucemiller/LaTeXML
LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
Language:Perl950 35 1.5k101
christophschuhmann/improved-aesthetic-predictor
CLIP+MLP Aesthetic Score Predictor
Language:Python905 6 1089
lyhue1991/eat_pyspark_in_10_days
pyspark🍒🥭 is delicious，just eat it!😋😋
Language:Python777 10 1212
mlfoundations/MINT-1T
MINT-1T: A one trillion token multimodal interleaved dataset.
773 25 1020
beichenzbc/Long-CLIP
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
Language:Python673 12 8133
allenai/pdffigures2
Given a scholarly PDF, extract figures, tables, captions, and section titles.
Language:Scala610 67 40120
snap-research/Panda-70M
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Language:Python525 11 4519
mangye16/Cross-Modal-Re-ID-baseline
Pytorch Code for Cross-Modality Person Re-Identification (Visible Thermal/Infrared Re-ID)
Language:Python349 10 2972
LinWeizheDragon/Retrieval-Augmented-Visual-Question-Answering
This is the official repository for Retrieval Augmented Visual Question Answering
Language:Python182 4 4515
THUDM/CogCoM
Language:Jupyter Notebook152 9 2710
SHI-Labs/CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Language:Python134 2 1310
zjukg/Structure-CLIP
[Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
Language:Python114 4 226
daooshee/HD-VG-130M
The HD-VG-130M Dataset
108 6 52
zhaohengyuan1/Genixer
(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator
Language:Python108 3 00
bojone/FSQ
Keras implement of Finite Scalar Quantization
Language:Python63 2 05
xiaoou2/proxy_pool
Language:Python44 1 011
ypwang61/negCLIPLoss_NormSim
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning
Language:Python6 2 00
StanLei52/ViT-Lens-Integration
ViT-Lens Integration to Multimodal Foundation Models
Language:Python20

Cherishnoobs

Cherishnoobs's Stars

immich-app/immich

hiyouga/LLaMA-Factory

facebookresearch/fastText

naklecha/llama3-from-scratch

facebookresearch/nougat

CompVis/taming-transformers

THUDM/GLM-4

modelscope/data-juicer

THUDM/CogVLM2

yuweihao/MambaOut

NVlabs/VILA

aigc-apps/EasyAnimate

brucemiller/LaTeXML

christophschuhmann/improved-aesthetic-predictor

lyhue1991/eat_pyspark_in_10_days

mlfoundations/MINT-1T

beichenzbc/Long-CLIP

allenai/pdffigures2

snap-research/Panda-70M

mangye16/Cross-Modal-Re-ID-baseline

LinWeizheDragon/Retrieval-Augmented-Visual-Question-Answering

THUDM/CogCoM

SHI-Labs/CuMo

zjukg/Structure-CLIP

daooshee/HD-VG-130M

zhaohengyuan1/Genixer

bojone/FSQ

xiaoou2/proxy_pool

ypwang61/negCLIPLoss_NormSim

StanLei52/ViT-Lens-Integration