SlongLiu
Ph.D. Student @ CST of Tsinghua University. Intern @IDEA-Research CVR group. homepage: lsl.zone
THU | IDEABeijing | Shenzhen
SlongLiu's Stars
deepseek-ai/DeepSeek-V3
leverimmy/THU-Annual-Eat
一年过去了,你在华子食堂里花的钱都花在哪儿了?
microsoft/OmniParser
A simple screen parsing tool towards pure vision based GUI agent
shxie2020/Awesome-UGVFM
A collection of vision foundation models unifying understanding and generation.
IDEA-Research/DINO-X-API
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
uni-medical/GMAI-VL
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI.
NVlabs/Hydra-MDP
BAAI-Agents/Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
baaivision/Emu3
Next-Token Prediction is All You Need
mit-han-lab/duo-attention
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
facebookresearch/lingua
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
QwenLM/Qwen2.5-VL
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
JusticeFighterDance/JusticeFighter110
田柯宇 (Tian Keyu)恶意攻击集群事件的证据揭露
real-stanford/diffusion_policy
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
NVIDIA/Megatron-Energon
Megatron's multi-modal data loader
All-Hands-AI/OpenHands
🙌 OpenHands: Code Less, Make More
shizhediao/Human-Contribution-Measurement
cfahlgren1/webllm-playground
Run LLMs in the Browser with MLC / WebLLM ✨
twke18/CAST
rt219/The-Emergence-of-Objectness
This is the official released code for our paper, The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos, which has been accepted by NeurIPS 2021.
LLaVA-VL/LLaVA-NeXT
baaivision/tokenize-anything
[ECCV 2024] Tokenize Anything via Prompting
NVlabs/EAGLE
Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
git-lfs/git-lfs
Git extension for versioning large files
baaivision/DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
IDEA-Research/Grounded-SAM-2
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.