2793145003's Stars
All-Hands-AI/OpenHands
🙌 OpenHands: Code Less, Make More
karpathy/LLM101n
LLM101n: Let's build a Storyteller
black-forest-labs/flux
Official inference repo for FLUX.1 models
datawhalechina/leedl-tutorial
《李宏毅深度学习教程》(李宏毅老师推荐👍,苹果书🍎),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
iterative/dvc
🦉 Data Versioning and ML Experiments
richards199999/Thinking-Claude
Let your Claude able to think
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
fudan-generative-vision/hallo
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
SakanaAI/AI-Scientist
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
AiuniAI/Unique3D
[NeurIPS 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
InternLM/HuixiangDou
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
lxtGH/OMG-Seg
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Peterande/D-FINE
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement 💥💥💥
nuno-faria/tetris-sql
Using SQL's Turing Completeness to Build Tetris
bytedance/1d-tokenizer
This repo contains the code for 1D tokenizer and generator
Alpha-VLLM/Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
ragavsachdeva/magi
Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.
baaivision/EVE
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
baaivision/DIVA
Diffusion Feedback Helps CLIP See Better
multimodal-art-projection/AutoKaggle
MaybeShewill-CV/segment-anything-u-specify
using clip and sam to segment any instance you specify with text prompt of any instance names
Epiphqny/PAR
The official implementation of PAR: Parallelized Autoregressive Visual Generation. https://epiphqny.github.io/PAR-project/
hyz317/StdGEN
lucasjinreal/ImageTokenizer
imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video.
BestAnHongjun/SentenceVAE
Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context
jeohalves/longkey
Code for the paper "LongKey: Keyphrase Extraction for Long Documents"