Midkey's Stars
jingyi0000/VLM_survey
Collection of AWESOME vision-language models for vision tasks
ZHU-Zhiyu/NVS_Solver
Source code of paper "NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer"
NVlabs/SegFormer
Official PyTorch implementation of SegFormer
yoxu515/aot-benchmark
An efficient modular implementation of Associating Objects with Transformers for Video Object Segmentation in PyTorch
z-x-yang/Segment-and-Track-Anything
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.
z-x-yang/AOT
Associating Objects with Transformers for Video Object Segmentation
983632847/Awesome-Multimodal-Object-Tracking
A personal investigative project to track the latest progress in the field of multi-modal object tracking.
laybebe/TATrans_SVOT
This is an implementation of “Target-aware Transformer for Satellite Video Object Tracking”
KindXiaoming/pykan
Kolmogorov Arnold Networks
wangxiao5791509/VisEvent_SOT_Benchmark
[IEEE TCYB 2023] The first large-scale tracking dataset by fusing RGB and Event cameras.
RS-Devotee/OODT
OODT:Oriented Object Detection and Tracking in SVD
YZCU/OOTB
[ISPRS 2024] Satellite Video Single Object Tracking: A Systematic Review and An Oriented Object Tracking Benchmark
YZCU/LMOD
[Datasets] LMOD: a large-scale and multiclass object detection dataset for satellite videos
zcablii/SARDet_100K
Offical implementation of MSFA and release of SARDet_100K dataset for Large-Scale Synthetic Aperture Radar (SAR) Object Detection
Wprofessor/SV248S_toolkit
OpenSpaceAI/UVLTrack
The official pytorch implementation of our AAAI 2024 paper "Unifying Visual and Vision-Language Tracking via Contrastive Learning"
JacobYuan7/RLIPv2
[ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
microsoft/RegionCLIP
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
labring/FastGPT
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.
hustvl/4DGaussians
[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
facebookresearch/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
embodied-generalist/embodied-generalist
[ICML 2024] Official code repository for 3D embodied generalist agent LEO
Wprofessor/SVLPNet
state-spaces/mamba
Mamba SSM architecture
facebookresearch/habitat-sim
A flexible, high-performance 3D simulator for Embodied AI research.
FengZicai/Cluster3DSeg
This is the official implementation of "Clustering based Point Cloud Representation Learning for 3D Analysis" (Accepted at ICCV 2023).
983632847/WebUAV-3M
WebUAV-3M: A million-scale multi-modal UAV tracking benchmark
EmbodiedGPT/EmbodiedGPT_Pytorch
google-deepmind/open_x_embodiment
FengZicai/Interpretable3D
This is the official implementation of "Interpretable3D: An Ad-Hoc Interpretable Classifier for 3D Point Clouds" (Accepted at AAAI 2024).