shallowdream204
M.S Student @ NLPR, CASIA. Generative Models and Multi-modal Large Language Models.
University of Chinese Academy of SciencesBeijing
Pinned Repositories
AnimateDiff
Official implementation of AnimateDiff.
anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
Awesome-CVPR2024-CVPR2021-CVPR2020-Low-Level-Vision
A Collection of Papers and Codes for CVPR2024/CVPR2021/CVPR2020 Low Level Vision
Awesome-ECCV2024-ECCV2020-Low-Level-Vision
A Collection of Papers and Codes for ECCV2024/ECCV2020 Low Level Vision
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
SODA-SR
[CVPR2024] Official implementation of SODA-SR.
shallowdream204's Repositories
shallowdream204/SODA-SR
[CVPR2024] Official implementation of SODA-SR.
shallowdream204/AnimateDiff
Official implementation of AnimateDiff.
shallowdream204/anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
shallowdream204/Awesome-CVPR2024-CVPR2021-CVPR2020-Low-Level-Vision
A Collection of Papers and Codes for CVPR2024/CVPR2021/CVPR2020 Low Level Vision
shallowdream204/Awesome-ECCV2024-ECCV2020-Low-Level-Vision
A Collection of Papers and Codes for ECCV2024/ECCV2020 Low Level Vision
shallowdream204/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
shallowdream204/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
shallowdream204/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
shallowdream204/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
shallowdream204/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
shallowdream204/DMD2
shallowdream204/flux
Official inference repo for FLUX.1 models
shallowdream204/InstantID
InstantID: Zero-shot Identity-Preserving Generation in Seconds š„
shallowdream204/latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
shallowdream204/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
shallowdream204/llama3
The official Meta Llama 3 GitHub site
shallowdream204/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
shallowdream204/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
shallowdream204/mamba
Mamba SSM architecture
shallowdream204/mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
shallowdream204/NAFNet
The state-of-the-art image restoration model without nonlinear activation functions.
shallowdream204/pytorch-grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
shallowdream204/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
shallowdream204/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
shallowdream204/ToMe
A method to increase the speed and lower the memory footprint of existing vision transformers.
shallowdream204/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
shallowdream204/transformers
š¤ Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
shallowdream204/transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
shallowdream204/VAR
[GPT beats diffusionš„] [scaling laws in visual generationš] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
shallowdream204/Vim
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model