Pinned Repositories
ClipSAM
CrossMAE
Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders
GLM-130B
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
OccWorld
3D World Model for Autonomous Driving
OMG-Seg
One Model For Image/Video/Instractive/Open-Vocabulary Segmentation
parkour
[CoRL 2023] Robot Parkour Learning
ProPainter
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
PseCo-CVPR2024
(CVPR 2024) Point, Segment and Count: A Generalized Framework for Object Counting
RAG-Survey
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
SSLRec
[WSDM'2024 Oral] "SSLRec: A Self-Supervised Learning Framework for Recommendation"
whuhxb's Repositories
whuhxb/3DGStream
[CVPR 2024 Highlight] Official repository for the paper "3DGStream: On-the-fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos".
whuhxb/AnimatableGaussians
Code of [CVPR 2024] "Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling"
whuhxb/cape
Computational Aerosciences Productivity & Execution
whuhxb/CAPEv2
Malware Configuration And Payload Extraction
whuhxb/delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
whuhxb/detic-sam
Detic + SAM for open-vocabulary object detection and segmentation.
whuhxb/embodied-generalist
[ICML 2024] Official code repository for 3D embodied generalist agent LEO
whuhxb/GaussianShader
code for GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
whuhxb/ISAT_with_segment_anything
Labeling tool with SAM(segment anything model),supports SAM, sam-hq, MobileSAM EdgeSAM etc.交互式半自动图像标注工具
whuhxb/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
whuhxb/MiniCPM-V
MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
whuhxb/multimodal
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
whuhxb/odin
Code for the paper: "ODIN: A Single Model for 2D and 3D Segmentation" (CVPR 2024)
whuhxb/OmDet
Fast and accurate open-vocabulary end-to-end object detection
whuhxb/PaliGemma-FineTuning
PaliGemma FineTuning
whuhxb/Paper-List
A paper list of my history reading. Robotics, Learning, Vision.
whuhxb/paper-list-added
autoupdate paper list
whuhxb/PaperReading
whuhxb/PaSCo
[CVPR 2024 Oral - Best paper award candidate] Official repository of "PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness"
whuhxb/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
whuhxb/RALF
Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".
whuhxb/semantic-gaussians
Official implemetation of the paper "Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting".
whuhxb/SHERT
[CVPR'24 Oral] Official PyTorch implementation for Semantic Human Mesh Reconstruction with Textures.
whuhxb/shine
[CVPR'24 Highlight] SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
whuhxb/sigllm
LLMs for sintel
whuhxb/stablediffusion
High-Resolution Image Synthesis with Latent Diffusion Models
whuhxb/T-Rex
API for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
whuhxb/THuman2.0-Dataset
whuhxb/vid2avatar
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition (CVPR2023)
whuhxb/X3D-Edit
X3D-Edit is an Extensible 3D (X3D) Graphics authoring tool for simple error-free creation, editing, validation and viewing of X3D scenes for interactive Web-based visualization. X3D-Edit runs as a standalone application or Netbeans plugin. The X3D file format is an advanced XML version of the original VRML97 international standard.