equus71's Stars
MC-E/ReVideo
reflex-dev/reflex
🕸️ Web apps in pure Python 🐍
FoundationVision/Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
ZhengPeng7/BiRefNet
[arXiv'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
dora-rs/dora
DORA (Dataflow-Oriented Robotic Application) is middleware designed to streamline and simplify the creation of AI-based robotic applications. It offers low latency, composable, and distributed dataflow capabilities. Applications are modeled as directed graphs, also referred to as pipelines.
KellerJordan/cifar10-airbench
94% on CIFAR-10 in 3.29 seconds 💨 96% in 35 seconds
Vision-CAIR/MiniGPT4-video
Official code for MiniGPT4-video
InstantStyle/InstantStyle
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥
yardenfren1996/B-LoRA
Implicit Style-Content Separation using B-LoRA
snap-research/MyVLM
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries"
myshell-ai/MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
bfshi/scaling_on_scales
When do we not need larger vision models?
fuxiao0719/GeoWizard
[ECCV'24] GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
Haiyang-W/GiT
🔥 [ECCV2024] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
OpenGVLab/VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
PaddlePaddle/PaddleDetection
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
deepseek-ai/DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
VAST-AI-Research/TripoSR
DLYuanGod/TinyGPT-V
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
NUS-HPC-AI-Lab/Neural-Network-Parameter-Diffusion
We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
karpathy/minbpe
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
whlzy/FiT
[ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model
Vchitect/SEINE
[ICLR 2024] SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
InstantID/InstantID
InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥
TencentARC/PhotoMaker
PhotoMaker
apple/ml-aim
This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models
AIGCDesignGroup/ReplaceAnything
Jiawei-Yang/Denoising-ViT
This is the official code release for our work, Denoising Vision Transformers.
yancie-yjr/StreamYOLO
Real-time Object Detection for Streaming Perception, CVPR 2022
numba/numba
NumPy aware dynamic Python compiler using LLVM