SJTUwxz

@MicrosoftShanghai

SJTUwxz's Stars

gradio-app/gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Language:Python35.1k 178 5.2k2.6k
HumanSignal/labelImg
LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.
Language:Python23.1k 406 7676.4k
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Language:Jupyter Notebook15.5k 115 3951.4k
facebookresearch/dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
Language:Jupyter Notebook9.6k 96 411852
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Language:Python6.3k 68 433424
msracver/Deep-Image-Analogy
The source code of 'Visual Attribute Transfer through Deep Image Analogy'.
Language:C++1.4k 63 46232
serengil/retinaface
RetinaFace: Deep Face Detection Library for Python
Language:Python1.3k 8 83164
YuanGongND/ast
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Language:Jupyter Notebook1.2k 17 137220
KaihuaTang/Scene-Graph-Benchmark.pytorch
A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”
Language:Jupyter Notebook1.1k 17 204227
clovaai/voxceleb_trainer
In defence of metric learning for speaker recognition
Language:Python1.1k 30 174274
google-research/nasbench
NASBench: A Neural Architecture Search Dataset and Benchmark
Language:Python687 19 24128
facebookresearch/omnivore
Omnivore: A Single Model for Many Visual Modalities
Language:Python560 19 3139
facebookresearch/LaViLa
Code release for "Learning Video Representations from Large Language Models"
Language:Python498 8 3845
facebookresearch/Ego4d
Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset
Language:Jupyter Notebook378 24 17652
showlab/all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
Language:Python280 6 2117
TXH-mercury/VALOR
[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Language:Python271 11 2315
joaanna/something_else
Code repository for the paper: 'Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks'
Language:Python147 6 2214
JayPatwardhan/ResNet-PyTorch
Basic implementation of ResNet 50, 101, 152 in PyTorch
Language:Jupyter Notebook96 1 227
haoliuhl/language-quantized-autoencoders
Language Quantized AutoEncoders
Language:Python94 1 35
automl/nas_benchmarks
Language:Python93 8 723
rehg-lab/eye-contact-cnn
Deep neural network trained to detect eye contact from facial image
Language:Python92 2 235
gyxxyg/VTG-LLM
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Language:Python83 3 112
EGO4D/audio-visual
Language:C62 10 1010
gurkirt/2D-kinectics
Train action classification model based on individual frames
Language:Python42 4 412
bogireddytejareddy/face-tracker
Face Tracker using RetinaFace Detector and Kalman Filter
Language:Python39 2 18
hello-jinwoo/LOVEU-CVPR2021
Language:Python27 4 33
sahalshajim/SS-OWFormer
Language:Python24 2 52
SJTUwxz/LoCoNet_ASD
code repo for LoCoNet: Long-Short Context Network for Active Speaker Detection
Language:Python23 1 44
THUNLP-MT/ModelCompose
Official code for our paper "Model Composition for Multimodal Large Language Models"
Language:Python18 5 32
ChimeraPy/Engine
Distributed computing framework for Multimodal data written in Python
Language:Python9 2 1840

SJTUwxz

SJTUwxz's Stars

gradio-app/gradio

HumanSignal/labelImg

IDEA-Research/Grounded-Segment-Anything

facebookresearch/dinov2

THUDM/CogVLM

msracver/Deep-Image-Analogy

serengil/retinaface

YuanGongND/ast

KaihuaTang/Scene-Graph-Benchmark.pytorch

clovaai/voxceleb_trainer

google-research/nasbench

facebookresearch/omnivore

facebookresearch/LaViLa

facebookresearch/Ego4d

showlab/all-in-one

TXH-mercury/VALOR

joaanna/something_else

JayPatwardhan/ResNet-PyTorch

haoliuhl/language-quantized-autoencoders

automl/nas_benchmarks

rehg-lab/eye-contact-cnn

gyxxyg/VTG-LLM

EGO4D/audio-visual

gurkirt/2D-kinectics

bogireddytejareddy/face-tracker

hello-jinwoo/LOVEU-CVPR2021

sahalshajim/SS-OWFormer

SJTUwxz/LoCoNet_ASD

THUNLP-MT/ModelCompose

ChimeraPy/Engine