yash0307
Computer Vision, Machine Learning.
Czech Technical University, Carnegie Mellon University, IIIT HyderabadPrague, Czech Republic
yash0307's Stars
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
mlfoundations/open_clip
An open source implementation of CLIP.
kornia/kornia
Geometric Computer Vision Library for Spatial AI
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
open-mmlab/mmsegmentation
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
makcedward/nlpaug
Data augmentation for NLP
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
magicleap/SuperGluePretrainedNetwork
SuperGlue: Learning Feature Matching with Graph Neural Networks (CVPR 2020, Oral)
rom1504/clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
zju3dv/LoFTR
Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022
PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlightš„] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
jy0205/LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
deepglint/unicom
MLCD & UNICOM : Large-Scale Visual Representation Model
Tangshitao/QuadTreeAttention
QuadTree Attention for Vision Transformers (ICLR2022)
facebookresearch/paco
This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts, and attributes prediction models, query evaluation scripts, and visualization notebooks.
shabie/docformer
Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
uta-smile/TCL
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022
weitong8591/differentiable_ransac
PyTorch Implementation of the ICCV 2023 paper: Generalized Differentiable RANSAC ($\nabla$-RANSAC).
facebookresearch/SWAG
Official repository for "Revisiting Weakly Supervised Pre-Training of Visual Perception Models". https://arxiv.org/abs/2201.08371.
facebookresearch/diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
rossumai/docile
DocILE: Document Information Localization and Extraction Benchmark
Yuting-Gao/DisCo-pytorch
Code for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning
facebookresearch/nbm-spam
Training and evaluating NBM and SPAM for interpretable machine learning.
yash0307/RecallatK_surrogate
Code for Recall@k Surrogate Loss with Large Batches and Similarity Mixup, CVPR 2022.
manyids2/mkd_local_descriptor
Implementation of [Understanding and Improving Kernel Local Descriptors](https://arxiv.org/abs/1811.11147) using PyTorch.