ZCMax

PhD student in HKU IDS | HKU-MMLab

HKU IDS | HKU-MMLabHong Kong SAR

ZCMax's Stars

RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python26.2k 175 8303k
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Language:Jupyter Notebook22.9k 312 3843k
meta-llama/llama3
The official Meta Llama 3 GitHub site
Language:Python21.4k 169 1612.1k
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
Language:Python17.3k 159 2661.6k
dair-ai/ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
12.4k 111 451.3k
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
Language:Jupyter Notebook9.6k 62 10675
apple/ml-ferret
Language:Python7.9k 149 0466
rerun-io/rerun
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Language:Rust5.4k 57 2.6k250
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Language:Python3k 25 120274
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
765 24 048
ActiveVisionLab/Awesome-LLM-3D
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
736 33 350
mbzuai-oryx/LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Language:Python693 10 3046
PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Language:Python667 7 4134
LLaVA-VL/LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Language:Python642 11 2449
LLaVA-VL/LLaVA-NeXT
Language:Python609 17 3931
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
Language:Python507 8 6959
magic-research/PLLaVA
Official repository for the paper PLLaVA
Language:Python383 10 4321
melon/qingwu-zimu
青梧字幕是一款基于whisper的AI字幕提取工具
Language:C++374 4 227
EPFL-VILAB/omnidata
A Scalable Pipeline for Making Steerable Multi-Task Mid-Level Vision Datasets from 3D Scans [ICCV 2021]
Language:Jupyter Notebook366 10 5845
dvlab-research/Stratified-Transformer
Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)
Language:Python347 6 9739
mbanani/probe3d
[CVPR 2024] Probing the 3D Awareness of Visual Foundation Models
Language:Python202 5 45
UMass-Foundation-Model/3D-VLA
Source codes for "3D-VLA: A 3D Vision-Language-Action Generative World Model"
1693
facebookresearch/open-eqa
OpenEQA Embodied Question Answering in the Era of Foundation Models
Language:Jupyter Notebook163 13 610
sshh12/multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Language:Python151 3 176
scene-verse/SceneVerse
Language:Python114 13 81
behavior-vision-suite/behavior-vision-suite.github.io
Language:CSS922
remyxai/VQASynth
Compose multimodal datasets 🎹
Language:Python904
zhouxian/act3d-chained-diffuser
A unified architecture for multimodal multi-task robotic policy learning.
Language:Python84 4 198
xuxw98/Online3D
[CVPR 2024] Memory-based Adapters for Online 3D Scene Perception
Language:Python48 3 00
joyhsu0504/NS3D
Language:Python37 1 84

ZCMax

ZCMax's Stars

RVC-Boss/GPT-SoVITS

openai/CLIP

meta-llama/llama3

hpcaitech/Open-Sora

dair-ai/ml-visuals

naklecha/llama3-from-scratch

apple/ml-ferret

rerun-io/rerun

dvlab-research/MGM

yunlong10/Awesome-LLMs-for-Video-Understanding

ActiveVisionLab/Awesome-LLM-3D

mbzuai-oryx/LLaVA-pp

PKU-YuanGroup/Chat-UniVi

LLaVA-VL/LLaVA-Plus-Codebase

LLaVA-VL/LLaVA-NeXT

open-compass/VLMEvalKit

magic-research/PLLaVA

melon/qingwu-zimu

EPFL-VILAB/omnidata

dvlab-research/Stratified-Transformer

mbanani/probe3d

UMass-Foundation-Model/3D-VLA

facebookresearch/open-eqa

sshh12/multi_token

scene-verse/SceneVerse

behavior-vision-suite/behavior-vision-suite.github.io

remyxai/VQASynth

zhouxian/act3d-chained-diffuser

xuxw98/Online3D

joyhsu0504/NS3D