Pinned Repositories
Awesome-DriveLM
📚 A collection of resources and papers on Large Language Models in autonomous driving
Awesome-VQVAE
📚 A collection of resources and papers on Vector Quantized Variational Autoencoder (VQ-VAE) and its application
CityGen
🏙️🌆🌃 Try Infinite and Controllable 3D City Layout Generation!
MovieChat
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
PoseDA
[ICCV 2023] Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
Self-supervised-Cross-view-3D-Human-Pose-Estimation-and-Localization-in-Video
A algorithm to process 3D-multi cross-view dataset based on Human3.6M or others, and realize the mapping from 2D joints location to 3D in our dataset.
StableVideo
[ICCV 2023] StableVideo: Text-driven Consistency-aware Diffusion Video Editing
STEVE
⛏💎 STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment
UIUC-CS357-22SP
Workspace for CS357
UniAP
[AAAI 2024] UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning
rese1f's Repositories
rese1f/StableVideo
[ICCV 2023] StableVideo: Text-driven Consistency-aware Diffusion Video Editing
rese1f/MovieChat
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
rese1f/Awesome-VQVAE
📚 A collection of resources and papers on Vector Quantized Variational Autoencoder (VQ-VAE) and its application
rese1f/CityGen
🏙️🌆🌃 Try Infinite and Controllable 3D City Layout Generation!
rese1f/STEVE
⛏💎 STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment
rese1f/Awesome-DriveLM
📚 A collection of resources and papers on Large Language Models in autonomous driving
rese1f/PoseDA
[ICCV 2023] Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
rese1f/UniAP
[AAAI 2024] UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning
rese1f/old_web
personal website built on beautiful jekyll, feel free to clone and modify
rese1f/UniVHP
Unified Human-centric Perception Model and Benchmark in Sports
rese1f/arxiv-daily
🎓 Automatically Update Some Fields Papers Daily using Github Actions (Update Every 12th hours)
rese1f/Awesome-LLM-3D
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
rese1f/3D-VisTA
Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"
rese1f/all-seeing
This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
rese1f/awesome-3D-gaussian-splatting
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
rese1f/Awesome-Foundation-Models
A curated list of foundation models for vision and language tasks
rese1f/Awesome-Long-Context
A curated list of resources about long-context in large-language models and video understanding.
rese1f/Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
rese1f/Awesome-Multimodal-Large-Language-Models
Latest Papers and Datasets on Multimodal Large Language Models
rese1f/awesome-NeRF
A curated list of awesome neural radiance fields papers
rese1f/Awesome-Skeleton-based-Action-Recognition
A curated paper list of awesome skeleton-based action recognition.
rese1f/DriveLM
DriveLM: Drive on Language
rese1f/ED-Pose
[ICLR 2023] Official implementation of the paper "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation "
rese1f/ipl-uw.github.io
Website for IPL
rese1f/LLaMA-Efficient-Tuning
Easy-to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA) (LLaMA-2, BLOOM, Falcon, Baichuan)
rese1f/LLM-Agent-Paper-List
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
rese1f/minisora
The Mini Sora project aims to explore the implementation path and future development direction of Sora.
rese1f/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
rese1f/OpenScene
3D Occupancy Prediction Benchmark in Autonomous Driving
rese1f/rese1f
Config files for my GitHub profile.