SJTUwxz's Stars
gradio-app/gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
HumanSignal/labelImg
LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
facebookresearch/dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
msracver/Deep-Image-Analogy
The source code of 'Visual Attribute Transfer through Deep Image Analogy'.
serengil/retinaface
RetinaFace: Deep Face Detection Library for Python
YuanGongND/ast
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
KaihuaTang/Scene-Graph-Benchmark.pytorch
A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”
clovaai/voxceleb_trainer
In defence of metric learning for speaker recognition
google-research/nasbench
NASBench: A Neural Architecture Search Dataset and Benchmark
facebookresearch/omnivore
Omnivore: A Single Model for Many Visual Modalities
facebookresearch/LaViLa
Code release for "Learning Video Representations from Large Language Models"
facebookresearch/Ego4d
Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset
showlab/all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
TXH-mercury/VALOR
[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
joaanna/something_else
Code repository for the paper: 'Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks'
JayPatwardhan/ResNet-PyTorch
Basic implementation of ResNet 50, 101, 152 in PyTorch
haoliuhl/language-quantized-autoencoders
Language Quantized AutoEncoders
automl/nas_benchmarks
rehg-lab/eye-contact-cnn
Deep neural network trained to detect eye contact from facial image
gyxxyg/VTG-LLM
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
EGO4D/audio-visual
gurkirt/2D-kinectics
Train action classification model based on individual frames
bogireddytejareddy/face-tracker
Face Tracker using RetinaFace Detector and Kalman Filter
hello-jinwoo/LOVEU-CVPR2021
sahalshajim/SS-OWFormer
SJTUwxz/LoCoNet_ASD
code repo for LoCoNet: Long-Short Context Network for Active Speaker Detection
THUNLP-MT/ModelCompose
Official code for our paper "Model Composition for Multimodal Large Language Models"
ChimeraPy/Engine
Distributed computing framework for Multimodal data written in Python