jingtaoli-sony's Stars
tencent-ailab/IP-Adapter
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
lewandofskee/MambaAD
[NeurIPS 2024] Official implementation of MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection.
JiayuanWang-JW/YOLOv8-multi-task
TIGER-AI-Lab/VIEScore
Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024 main)
boschresearch/ALDM
Official implementation of "Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive" (ICLR 2024)
mcordts/cityscapesScripts
README and scripts for the Cityscapes Dataset
TissueImageAnalytics/cerberus
One Model is All You Need: Multi-Task Learning Enables Simultaneous Histology Image Segmentation and Classification
leeyeehoo/CSRNet-pytorch
CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
apple/ml-4m
4M: Massively Multimodal Masked Modeling
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
cientgu/InstructDiffusion
PyTorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.
bodaay/HuggingFaceModelDownloader
Simple go utility to download HuggingFace Models and Datasets
kylesargent/ZeroNVS
NVlabs/genvs
rom1504/clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
baegwangbin/surface_normal_uncertainty
[ICCV 2021 Oral] Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation
cvlab-columbia/zero123
Zero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)
kongzhecn/OMG
[ECCV 2024] OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models
ViTAE-Transformer/ViTPose
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
open-mmlab/mmyolo
OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.
hako-mikan/sd-webui-regional-prompter
set prompt to divided region
Sanster/IOPaint
Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
advimman/lama
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
yhenon/pytorch-retinanet
Pytorch implementation of RetinaNet object detection.
Megvii-BaseDetection/YOLOX
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
DonaldRR/SimpleNet
eric-ai-lab/PEViT
Official implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"
kyegomez/Vit-RGTS
Open source implementation of "Vision Transformers Need Registers"