RunsenXu's Stars
HCPLab-SYSU/Embodied_AI_Paper_List
[Embodied-AI-Survey-2024] Paper list and projects for Embodied AI
shreyansh26/Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
TangYuan96/MiniGPT-3D
[MM 2024] [Need a RTX 3090] MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors
Zeyi-Lin/HivisionIDPhotos
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
ZiyuGuo99/SAM2Point
The Most Faithful Implementation of Segment Anything (SAM) in 3D
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
zubair-irshad/Awesome-Robotics-3D
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
google-deepmind/tapnet
Tracking Any Point (TAP)
hacksider/Deep-Live-Cam
real time face swap and one-click video deepfake with only a single image
jinlinyi/PerspectiveFields
[CVPR 2023 Highlight] Perspective Fields for Single Image Camera Calibration
opendatalab/MinerU
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
Stirling-Tools/Stirling-PDF
#1 Locally hosted web application that allows you to perform various operations on PDF files
facebookresearch/vggsfm
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
karpathy/LLM101n
LLM101n: Let's build a Storyteller
EurekaLabsAI/mlp
The Multilayer Perceptron Language Model
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
OpenRobotLab/GRUtopia
GRUtopia: Dream General Robots in a City at Scale
IDEA-Research/Grounding-DINO-1.5-API
API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
microsoft/vscode
Visual Studio Code
OpenRobotLab/Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
yuweihao/MambaOut
MambaOut: Do We Really Need Mamba for Vision?
bertjiazheng/Structured3D
[ECCV'20] Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
huggingface/lerobot
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
kornia/kornia
Geometric Computer Vision Library for Spatial AI
verlab/accelerated_features
Implementation of XFeat (CVPR 2024). Do you need robust and fast local feature extraction? You are in the right place!
zju3dv/pats
Code for "PATS: Patch Area Transportation with Subdivision for Local Feature Matching", CVPR 2023
facebookresearch/lightplane
Lightplane implements a highly memory-efficient differentiable radiance field renderer, and a module for unprojecting features from images to 3D grids.
meta-llama/llama3
The official Meta Llama 3 GitHub site
jwasham/coding-interview-university
A complete computer science study plan to become a software engineer.