ekazakos's Stars
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
chenfei-wu/TaskMatrix
huggingface/pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
lucidrains/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
UKPLab/sentence-transformers
State-of-the-Art Text Embeddings
mlfoundations/open_clip
An open source implementation of CLIP.
kkroening/ffmpeg-python
Python bindings for FFmpeg - with complex filtering support
open-mmlab/mmpose
OpenMMLab Pose Estimation Toolbox and Benchmark.
OpenGVLab/LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
cmhungsteve/Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
facebookresearch/deit
Official DeiT repository
tlkh/asitop
Perf monitoring CLI tool for Apple Silicon
nightrome/really-awesome-gan
A list of papers on Generative Adversarial (Neural) Networks
dmlc/decord
An efficient video loader for deep learning with smart shuffling that's super easy to digest
cvlab-columbia/viper
Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"
lucidrains/flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
KMnP/vpt
❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119
allenai/visprog
Official code for VisProg (CVPR 2023 Best Paper!)
LukasBommes/mv-extractor
Extract frames and motion vectors from H.264 and MPEG-4 encoded video.
taoyang1122/adapt-image-models
[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition
antoyang/VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
antoyang/just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
prannaykaul/mm-ovod
Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"
epic-kitchens/epic-sounds-annotations
Splits for epic-sounds dataset
epic-kitchens/epic-sounds