vistapath-dan's Stars
BaranziniLab/KG_RAG
Empower Large Language Models (LLM) using Knowledge Graph based Retrieval-Augmented Generation (KG-RAG) for knowledge intensive tasks
zhanghm1995/Forge_VFM4AD
A comprehensive survey of forging vision foundation models for autonomous driving, including challenges, methodologies, and opportunities.
JamesQFreeman/LoRA-ViT
Low rank adaptation for Vision Transformer
TianxingWu/FreeInit
[ECCV 2024] FreeInit: Bridging Initialization Gap in Video Diffusion Models
LTH14/rcg
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
cvlab-stonybrook/SelfMedMAE
Code for ISBI 2023 paper "Self Pre-training with Masked Autoencoders for Medical Image Classification and Segmentation"
csrhddlam/axial-deeplab
This is a PyTorch re-implementation of Axial-DeepLab (ECCV 2020 Spotlight)
camlaedtke/segmentation_pytorch
Simple image segmentation pipeline in pytorch, using HRNet and SegFormer models
ashleve/lightning-hydra-template
PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
MedMNIST/MedMNIST
[pip install medmnist] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification
google/style-aligned
Official code for "Style Aligned Image Generation via Shared Attention"
SHI-Labs/Smooth-Diffusion
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models arXiv 2023 / CVPR 2024
MeetKai/functionary
Chat language model that can use tools and interpret the results
SUDO-AI-3D/zero123plus
Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
Alpha-VLLM/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
apexrl/Diff4RLSurvey
This repository contains a collection of resources and papers on Diffusion Models for RL, accompanying the paper "Diffusion Models for Reinforcement Learning: A Survey"
google/break-a-scene
Official implementation for "Break-A-Scene: Extracting Multiple Concepts from a Single Image" [SIGGRAPH Asia 2023]
Picsart-AI-Research/Text2Video-Zero
[ICCV 2023 Oral] Text-to-Image Diffusion Models are Zero-Shot Video Generators
IBM/ModuleFormer
ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.
microsoft/LMOps
General technology for enabling AI capabilities w/ LLMs and MLLMs
microsoft/NUWA
A unified 3D Transformer Pipeline for visual synthesis
facebookresearch/vissl
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
martijnfolmer/CVSimilarityViaEmbedding
superjamessyx/Generative-Foundation-AI-Assistant-for-Pathology
SuperMedIntel/Medical-SAM-Adapter
Adapting Segment Anything Model for Medical Image Segmentation
bowang-lab/MedSAM
Segment Anything in Medical Images
xmed-lab/CLIP_Surgery
CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"