m-bain

multimodal

VGG, University of Oxford

Pinned Repositories

chimp-detector
Language:Python2 2 00
clip-hitchhiker
A Clip-Hitchiker's Guide to Long Video Retrieval [Arxiv 2022]
Language:Python9 4 01
CondensedMovies
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
Language:Python166 10 928
CondensedMovies-chall
Condensed Movies Challenge 2021
Language:Python17 3 42
frozen-in-time
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
Language:Python354 11 4743
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language:Jupyter Notebook3 1 00
pytorch-multi-label-classifier
A pytorch implemented classifier for Multiple-Label classification
Language:Python3 2 00
video-transformers
Implementations of Transformers for Video
Language:Python23 4 01
webvid
Large-scale text-video dataset. 10 million captioned short videos.
Language:Python612 9 2138
whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python13k 137 7341.4k

m-bain's Repositories

m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python13k 137 7341.4k
m-bain/webvid
Large-scale text-video dataset. 10 million captioned short videos.
Language:Python612 9 2138
m-bain/frozen-in-time
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
Language:Python354 11 4743
m-bain/CondensedMovies
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
Language:Python166 10 928
m-bain/video-transformers
Implementations of Transformers for Video
Language:Python23 4 01
m-bain/CondensedMovies-chall
Condensed Movies Challenge 2021
Language:Python17 3 42
m-bain/clip-hitchhiker
A Clip-Hitchiker's Guide to Long Video Retrieval [Arxiv 2022]
Language:Python9 4 01
m-bain/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language:Jupyter Notebook3 1 00
m-bain/pytorch-multi-label-classifier
A pytorch implemented classifier for Multiple-Label classification
Language:Python3 2 00
m-bain/chimp-detector
Language:Python2 2 00
m-bain/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Language:Python2 1 01
m-bain/SimpleDiarization
Simple Diarization model
Language:Python2 1 0
m-bain/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Language:Jupyter Notebook1 1 0
m-bain/collaborative-experts
Video embeddings for retrieval - code for the paper "Use What You Have: Video retrieval using representations from collaborative experts"
Language:Python1 2 00
m-bain/conceptual-12m
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
1 1 00
m-bain/primate-behaviour-recognition
Automated Audiovisual Behaviour Recognition in Wild Primates
1 3 1
m-bain/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Language:Python1 1 0
m-bain/pytorch-image-models
PyTorch image models, scripts, pretrained weights -- (SE)ResNet/ResNeXT, DPN, EfficientNet, MixNet, MobileNet-V3/V2, MNASNet, Single-Path NAS, FBNet, and more
Language:Python1 1 0
m-bain/reka-vibe-eval
Multimodal language model benchmark, featuring challenging examples
1
m-bain/slurm_gpustat
A simple command line tool to show GPU usage on a SLURM cluster
Language:Python1 1 0
m-bain/torchvggish
Pytorch port of Google Research's VGGish model used for extracting audio features.
Language:Python1 2 0
m-bain/video2dataset
Easily create large video dataset from video urls
Language:Python1 1 0
m-bain/web-retrieval-demo
Language:Python1 3 01
m-bain/whisper-asr-webservice
OpenAI Whisper ASR Webservice API
Language:Python1 1 0
m-bain/bert-as-service
Mapping a variable-length sentence to a fixed-length vector using BERT model
Language:Python1 0
m-bain/hydra
Hydra is a framework for elegantly configuring complex applications
Language:Python1 0
m-bain/meeteval
Language:Python1 0
m-bain/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python1 0
m-bain/video_features
Extract video features from raw videos using multiple GPUs. We support RAFT and PWC flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, ResNet features.
Language:Python1 01
m-bain/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Jupyter Notebook1 0