zyuanbing
Student in NLPR, CASIA. Interested in computer vision, especially network architecture design, pursing M.Sc in computer science.
zyuanbing's Stars
ChenDelong1999/subobjects
Official repository of paper "Subobject-level Image Tokenization"
DirtyHarryLYL/LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
vpulab/ovam
Code for the paper Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models @ CVPR 2024
JShollaj/awesome-llm-interpretability
A curated list of Large Language Model (LLM) Interpretability resources.
mbanani/probe3d
[CVPR 2024] Probing the 3D Awareness of Visual Foundation Models
meta-llama/llama3
The official Meta Llama 3 GitHub site
sinahmr/NACLIP
PyTorch Implementation of NACLIP in "Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation"
yossigandelsman/clip_text_span
official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"
google/diffseg
DiffSeg is an unsupervised zero-shot segmentation method using attention information from a stable-diffusion model. This repo implements the main DiffSeg algorithm and additionally includes an experimental feature to add semantic labels to the masks based on a generated caption.
kyegomez/VisionMamba
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images
MengyuWang826/SegRefiner
SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
kyegomez/Vit-RGTS
Open source implementation of "Vision Transformers Need Registers"
mhamilton723/FeatUp
Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024
openai/transformer-debugger
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
xai-org/grok-1
Grok open release
Haiyang-W/GiT
Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
52CV/CVPR-2024-Papers
HarborYuan/ovsam
[arXiv preprint] The official code of paper "Open-Vocabulary SAM".
dvlab-research/Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
wangf3014/SCLIP
Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
lambert-x/ProLab
Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties"
TransformerLensOrg/TransformerLens
A library for mechanistic interpretability of GPT-style language models
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
MaverickRen/PixelLM
PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.
ytongbai/LVM
dvlab-research/LLaMA-VID
Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
LLaVA-VL/LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills