pikerbright's Stars
labmlai/annotated_deep_learning_paper_implementations
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
brendangregg/FlameGraph
Stack trace visualizer
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
instantX-research/InstantID
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
facebookresearch/dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
conan-io/conan
Conan - The open-source C and C++ package manager
Lightning-AI/lit-llama
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
facebookresearch/sapiens
High-resolution models for human tasks.
JingyunLiang/SwinIR
SwinIR: Image Restoration Using Swin Transformer (official repository)
jarro2783/cxxopts
Lightweight C++ command line option parser
karpathy/build-nanogpt
Video+code lecture on building nanoGPT from scratch
huggingface/optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
megvii-research/NAFNet
The state-of-the-art image restoration model without nonlinear activation functions.
Yuliang-Liu/Monkey
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Ucas-HaoranWei/Vary
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
cacay/MemoryPool
An easy to use and efficient memory pool allocator written in C++.
microsoft/FaceSynthetics
kijai/ComfyUI-Florence2
Inference Microsoft Florence2 VLM
PeizeSun/TransTrack
Multiple Object Tracking with Transformer
foivospar/Arc2Face
[ECCV 2024 Oral🔥] Arc2Face: A Foundation Model for ID-Consistent Human Faces
NVlabs/EAGLE
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Yuliang-Liu/MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
AssafSinger94/dino-tracker
Official Pytorch Implementation for “DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video”
naver/multi-hmr
Pytorch demo code and models for Multi-HMR
KupynOrest/head_detector
Official repo for VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads.