butterfly111's Stars
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
antoyang/VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
facebookresearch/mae
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
microsoft/XPretrain
Multi-modality pre-training
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
xuguohai/X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
alibaba/AliceMind
ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
dhansmair/flamingo-mini
Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training
decisionforce/TPN
[CVPR 2020] Temporal Pyramid Network for Action Recognition
jbohnslav/opencv_transforms
OpenCV implementation of Torchvision's image augmentations
gmalivenko/pytorch2keras
PyTorch to Keras model convertor
shuangshuangguo/tsn-tensorflow
This is tensorflow implementation for TSN(Temporal Segment Networks)
Blssel/TSN-tensorflow
MichiganCOG/M-PACT
A one stop shop for all of your activity recognition needs.
kevinlin311tw/ava-dataset-tool
Preprocessing tools for Google AVA Dataset
cvdfoundation/ava-dataset
The AVA dataset densely annotates 80 atomic visual actions in 351k movie clips with actions localized in space and time, resulting in 1.65M action labels with multiple labels per human occurring frequently.
open-mmlab/mmaction2
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
NVlabs/STEP
STEP: Spatio-Temporal Progressive Learning for Video Action Detection. CVPR'19 (Oral)
MVIG-SJTU/AlphAction
Spatio-Temporal Action Localization System
ranandalon/mtl
Unofficial implementation of: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics
zhang-can/PAN-PyTorch
[Codes of paper]: PAN: Towards Fast Action Recognition via Learning Persistence of Appearance
mit-han-lab/temporal-shift-module
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
facebookresearch/SlowFast
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
CeLuigi/models-comparison.pytorch
Code for the paper Benchmark Analysis of Representative Deep Neural Network Architectures
wsargent/docker-cheat-sheet
Docker Cheat Sheet
alainray/ava_downloader
Scripts for downloading the AVA (Atomic Visual Actions) dataset https://research.google.com/ava/ and do postprocessing of it.