2793145003

2793145003's Stars

All-Hands-AI/OpenHands
🙌 OpenHands: Code Less, Make More
Language:Python41.2k 329 2k4.6k
karpathy/LLM101n
LLM101n: Let's build a Storyteller
30.9k 2.6k 01.7k
black-forest-labs/flux
Official inference repo for FLUX.1 models
Language:Python19.2k 163 01.4k
datawhalechina/leedl-tutorial
《李宏毅深度学习教程》（李宏毅老师推荐👍，苹果书🍎），PDF下载地址：https://github.com/datawhalechina/leedl-tutorial/releases
Language:Jupyter Notebook14.2k 283 1042.9k
iterative/dvc
🦉 Data Versioning and ML Experiments
Language:Python14.1k 135 4.7k1.2k
richards199999/Thinking-Claude
Let your Claude able to think
Language:TypeScript11.9k 88 271.4k
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Language:Python10.1k 127 491950
fudan-generative-vision/hallo
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
Language:Python9.7k 654 1571.3k
SakanaAI/AI-Scientist
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬
Language:Jupyter Notebook8.5k 106 1191.2k
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Language:Python6.3k 44 151412
AiuniAI/Unique3D
[NeurIPS 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
Language:Python3.2k 40 117255
InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Language:Python2.7k 44 406163
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Language:Python2k 33 125116
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Language:Python1.8k 23 71123
InternLM/HuixiangDou
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Language:Python1.7k 23 40132
lxtGH/OMG-Seg
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Language:Python1.4k 22 5950
Peterande/D-FINE
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement 💥💥💥
Language:Python1.3k 32 117102
nuno-faria/tetris-sql
Using SQL's Turing Completeness to Build Tetris
Language:PLpgSQL924 7 138
bytedance/1d-tokenizer
This repo contains the code for 1D tokenizer and generator
Language:Jupyter Notebook624 13 5329
Alpha-VLLM/Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
Language:Python530 5 3222
ragavsachdeva/magi
Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.
308 6 712
baaivision/EVE
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
Language:Python255 9 165
baaivision/DIVA
Diffusion Feedback Helps CLIP See Better
Language:Python237 8 1012
multimodal-art-projection/AutoKaggle
Language:Python174 2 58
MaybeShewill-CV/segment-anything-u-specify
using clip and sam to segment any instance you specify with text prompt of any instance names
Language:Python173 2 1212
Epiphqny/PAR
The official implementation of PAR: Parallelized Autoregressive Visual Generation. https://epiphqny.github.io/PAR-project/
Language:Python101 10 31
hyz317/StdGEN
93 13 13
lucasjinreal/ImageTokenizer
imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video.
Language:Python30 2 1
BestAnHongjun/SentenceVAE
Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context
Language:Python22 5 17
jeohalves/longkey
Code for the paper "LongKey: Keyphrase Extraction for Long Documents"
Language:Python9 1 11