gyhandy
Ph.D. Student in USC, interested in Computer Vision, Machine Learning, and AGI
University of Southern CaliforniaLos Angeles
gyhandy's Stars
Significant-Gravitas/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
meta-llama/llama
Inference code for Llama models
chenfei-wu/TaskMatrix
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
meta-llama/codellama
Inference code for CodeLlama models
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
mistralai/mistral-inference
Official inference library for Mistral models
NVlabs/imaginaire
NVIDIA's Deep Imagination Team's PyTorch Library
dreamgaussian/dreamgaussian
[ICLR 2024 Oral] Generative Gaussian Splatting for Efficient 3D Content Creation
UX-Decoder/Semantic-SAM
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
isl-org/ZoeDepth
Metric depth estimation from a single image
ScanNet/ScanNet
gradslam/gradslam
gradslam is an open source differentiable dense SLAM library for PyTorch
NVlabs/BundleSDF
[CVPR 2023] BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
zju3dv/OnePose
Code for "OnePose: One-Shot Object Pose Estimation without CAD Models", CVPR 2022
Jumpat/SegmentAnythingin3D
Segment Anything in 3D with NeRFs (NeurIPS 2023)
facebookresearch/omni3d
Code release for "Omni3D A Large Benchmark and Model for 3D Object Detection in the Wild"
apple/ARKitScenes
This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and process assets, and training code described in our paper.
Vision-CAIR/ChatCaptioner
Official Repository of ChatCaptioner
url-kaist/dynaVINS
DynaVINS : A Visual-Inertial SLAM for Dynamic Environments
OPPO-Mente-Lab/Subject-Diffusion
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
SamsungLabs/imvoxelnet
[WACV2022] ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
crockwell/Cap3D
[NeurIPS 2023] Scalable 3D Captioning with Pretrained Models
concept-fusion/concept-fusion
Code release for ConceptFusion [RSS 2023]
notmahi/clip-fields
Teaching robots to respond to open-vocab queries with CLIP and NeRF-like neural fields
Yushi-Hu/tifa
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
DavidMChan/caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
gyhandy/Text2Image-for-Detection
DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection