Seerkfang
:rocket: free and curious mind:dragon:
ZJU -> UCSD -> Nvidia ResearchSanta Clara, United States
Pinned Repositories
haosulab.github.io
minisql
verify_cot
VILA
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
mmtracking
OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
K3C
K3C OPENWRT A1/B1/B1G/B2/C1/S1
Seerkfang.github.io
verify_cot
VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
large_vlm_distillation_ood
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)
Seerkfang's Repositories
Seerkfang/K3C
K3C OPENWRT A1/B1/B1G/B2/C1/S1
Seerkfang/Seerkfang.github.io
Seerkfang/verify_cot
Seerkfang/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)