Pinned Repositories
1xN
1xN Block Pattern for Network Sparsity
AAL-pruning
Filter Pruning for Deep Convolutional Neural Networks via Auxiliary Attention
DW
A Dual Weighting Label Assignment Scheme for Object Detection
DyRep
Official implementation for paper "DyRep: Bootstrapping Training with Dynamic Re-parameterization", CVPR 2022
GASN
A Novel Guided Anchor Siamese Network for Arbitrary Target-Of-Interest Tracking in Video-SAR
LPNet-PyTorch
This repository is a PyTorch version of the paper "Luminance-aware Pyramid Network for Low-light Image Enhancement" (TMM 2020).
ResamplingNet
ResamplingNet: End-to-End Adaptive Feature Resampling Network for Real-Time Aerial Tracking
Restoring-Extremely-Dark-Images-In-Real-Time
The project is the official implementation of our CVPR 2021 paper, "Restoring Extremely Dark Images in Real Time"
StreamYOLO
Real-time Object Detection for Streaming Perception, CVPR 2022
Ultra-Fast-Lane-Detection-v2-plus
based on ufld-v2
scott-mao's Repositories
scott-mao/aurora
[train + eval + deploy] Aurora Series: A more efficient multimodal large language model series for video.
scott-mao/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
scott-mao/BEAF
[ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"
scott-mao/CIIT
Chinese Interleaved Image-Text Dataset
scott-mao/conv-llava
scott-mao/DenseConnector
Dense Connector for MLLMs
scott-mao/EmoLLM
EmoLLM: Multimodal Emotional Understanding Meets Large Language Models
scott-mao/EVA
EVA Series: Visual Representation Fantasies from BAAI
scott-mao/FlexAttention
scott-mao/FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
scott-mao/image-textualization
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions
scott-mao/LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
scott-mao/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
scott-mao/MQT-LLaVA
Matryoshka Query Transformer for Large Vision-Language Models
scott-mao/OMG-Seg
OMG-LLaVA and OMG-Seg codebase
scott-mao/QSLAW
The official code for "Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation" | [MM2024]
scott-mao/Qwen2
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
scott-mao/shell-v
Large vision-language models based on Shell developed by PKU-KCL
scott-mao/SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
scott-mao/Steel-LLM
Train a Chinese LLM From 0 by Personal
scott-mao/Surgical-LVLM
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded VQA in Robotic Surgery
scott-mao/swift
ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 30+ MLLMs
scott-mao/TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
scott-mao/Video-CCAM
A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.
scott-mao/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
scott-mao/ViLAS
Fast and Lightweight Vision-Language Model for Adversarial Traffic Sign Detection
scott-mao/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
scott-mao/VL-RLHF
A RLHF Infrastructure for Vision-Language Models
scott-mao/VLM-Grounding
scott-mao/VoCo-LLaMA
VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".