matthewdm0816's Stars
camenduru/stable-diffusion-webui-colab
stable diffusion webui colab
Rem0o/FanControl.Releases
This is the release repository for Fan Control, a highly customizable fan controlling software for Windows.
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Blealtan/efficient-kan
An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
mseitzer/pytorch-fid
Compute FID scores with PyTorch.
jettify/pytorch-optimizer
torch-optimizer -- collection of optimizers for Pytorch
torch-points3d/torch-points3d
Pytorch framework for doing deep learning on point clouds.
Zjh-819/LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
rom1504/clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
yuweihao/MambaOut
MambaOut: Do We Really Need Mamba for Vision?
zhoubolei/bolei_awesome_posters
CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!
scito/extract_otp_secrets
Extract one time password (OTP) secrets from QR codes exported by two-factor authentication (2FA) apps such as "Google Authenticator". The exported QR codes from authentication apps can be captured by camera, read from images, or read from text files. The secrets can be exported to JSON or CSV, or printed as QR codes to console.
ActiveVisionLab/Awesome-LLM-3D
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
OpenRobotLab/PointLLM
[ECCV 2024 Oral] PointLLM: Empowering Large Language Models to Understand Point Clouds
ChanganVR/awesome-embodied-vision
Reading list for research topics in embodied vision
GraphPKU/PiSSA
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Open3DA/LL3DA
[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
zaibacu/thesaurus
Offline database of synonyms/thesaurus
RUCAIBox/POPE
The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
hako-mikan/sd-webui-traintrain
LoRA training extention for Stable Diffusion Web-UI
ch3cook-fdu/Vote2Cap-DETR
[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods
chenguolin/InstructScene
[ICLR 2024 spotlight] Official implementation of "InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior".
matthewdm0816/BridgeQA
[AAAI 24] Official Codebase for BridgeQA: Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA
idejie/3DSyn
TerryLiu18/image-captioning-for-celebrities
image captioning with face recognition for celebrities
idejie/KAD