engindeniz's Stars
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
apple/ml-ferret
ml-explore/mlx
MLX: An array framework for Apple silicon
AILab-CVC/SEED-Bench
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
filipecalegario/awesome-generative-ai
A curated list of Generative AI tools, works, models, and references
mbzuai-oryx/Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
dair-ai/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
yuhangzang/UPT
THUDM/P-tuning-v2
An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks
microsoft/VideoX
VideoX: a collection of video cross-modal models
google/latexify_py
A library to generate LaTeX expression from Python code.
KMnP/vpt
❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
antoyang/FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
antoyang/just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
karpathy/nn-zero-to-hero
Neural Networks: Zero to Hero
showlab/EgoVLP
[NeurIPS2022] Egocentric Video-Language Pretraining
microsoft/LAVENDER
A Unified Framework for Video-Language Understanding
cmhungsteve/Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
microsoft/UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
a-nagrani/ffmpeg-commands
Collection of useful FFMPEG commands for processing audio and video files.
opencv/opencv
Open Source Computer Vision Library
microsoft/SwinBERT
Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
SwinTransformer/Video-Swin-Transformer
This is an official implementation for "Video Swin Transformers".
s9xie/Mini-Kinetics-200
Mini-Kinetics-200 data splits used in paper "Rethinking Spatiotemporal Feature Learning For Video Understanding"
jayleicn/TVCaption
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset
m-bain/frozen-in-time
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
medhini/clip_it
CLIP-It! Language-Guided Video Summarization