butterfly111

butterfly111's Stars

vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python28.4k4.2k
antoyang/VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
Language:Jupyter Notebook17519
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Language:Python50640
facebookresearch/mae
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
Language:Python7.2k1.2k
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
Language:Python3.7k281
microsoft/XPretrain
Multi-modality pre-training
Language:Python46836
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Language:Python19.9k2.5k
xuguohai/X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
Language:Python13215
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Language:Jupyter Notebook25.3k3.3k
alibaba/AliceMind
ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
Language:Python2k291
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
Language:Python6.9k711
dhansmair/flamingo-mini
Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training
Language:Python16316
decisionforce/TPN
[CVPR 2020] Temporal Pyramid Network for Action Recognition
Language:Python39455
jbohnslav/opencv_transforms
OpenCV implementation of Torchvision's image augmentations
Language:Python37546
gmalivenko/pytorch2keras
PyTorch to Keras model convertor
Language:Python857143
shuangshuangguo/tsn-tensorflow
This is tensorflow implementation for TSN(Temporal Segment Networks)
Language:Python217
Blssel/TSN-tensorflow
Language:Python111
MichiganCOG/M-PACT
A one stop shop for all of your activity recognition needs.
Language:Python10725
kevinlin311tw/ava-dataset-tool
Preprocessing tools for Google AVA Dataset
Language:Python4917
cvdfoundation/ava-dataset
The AVA dataset densely annotates 80 atomic visual actions in 351k movie clips with actions localized in space and time, resulting in 1.65M action labels with multiple labels per human occurring frequently.
31128
open-mmlab/mmaction2
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Language:Python4.2k1.2k
NVlabs/STEP
STEP: Spatio-Temporal Progressive Learning for Video Action Detection. CVPR'19 (Oral)
Language:Python24648
MVIG-SJTU/AlphAction
Spatio-Temporal Action Localization System
Language:Python40073
ranandalon/mtl
Unofficial implementation of: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics
Language:Python54278
zhang-can/PAN-PyTorch
[Codes of paper]: PAN: Towards Fast Action Recognition via Learning Persistence of Appearance
Language:Python10210
mit-han-lab/temporal-shift-module
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Language:Python2.1k417
facebookresearch/SlowFast
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Language:Python6.6k1.2k
CeLuigi/models-comparison.pytorch
Code for the paper Benchmark Analysis of Representative Deep Neural Network Architectures
Language:Python16529
wsargent/docker-cheat-sheet
Docker Cheat Sheet
22.1k4.7k
alainray/ava_downloader
Scripts for downloading the AVA (Atomic Visual Actions) dataset https://research.google.com/ava/ and do postprocessing of it.
Language:Python293