JosephKJ's Stars
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
apple/ml-ferret
OpenAccess-AI-Collective/axolotl
Go ahead and axolotl questions
Stability-AI/StableCascade
Official Code for Stable Cascade
HVision-NKU/StoryDiffusion
Create Magic Story!
google/gemma_pytorch
The official PyTorch implementation of Google's Gemma models
philz1337x/clarity-upscaler
Clarity AI | AI Image Upscaler & Enhancer - free and open-source Magnific Alternative
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
wangkai930418/awesome-diffusion-categorized
collection of diffusion model papers categorized by their subareas
mbzuai-oryx/LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
HyperGAI/HPT
HPT - Open Multimodal LLMs from HyperGAI
Haiyang-W/GiT
Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
ZYM-PKU/UDiffText
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
yeungchenwa/Recommendations-Diffusion-Text-Image
A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten generation, scene text recognition and scene text detection.
reka-ai/reka-vibe-eval
Multimodal language model benchmark, featuring challenging examples
Qrange-group/SUR-adapter
ACM MM'23 (oral), SUR-adapter for pre-trained diffusion models can acquire the powerful semantic understanding and reasoning capabilities from large language models to build a high-quality textual semantic representation for text-to-image generation.
ZhexinLiang/Control-Color
Control Color: Multimodal Diffusion-based Interactive Image Colorization
jefferyZhan/Griffon
The official repo of Griffon
HelenMao/MAG-Edit
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance
mbzuai-oryx/PALO
Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.
graphic-design-ai/graphist
Official Repo of Graphist
RL-VIG/LibContinual
A Framework of Continual Learning
humansensinglab/ITI-GEN
[ICCV 2023 Oral, Best Paper Finalist] ITI-GEN: Inclusive Text-to-Image Generation
EnergyAttention/Energy-Based-CrossAttention
The official repository of "Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models".
kiranchhatre/amuse
[CVPR 2024] AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
FanZhichen/Awesome-Incremental-Few-Shot-Object-Detection
A paper list for incremental few-shot object detection.
mbzuai-oryx/Awesome-CV-Foundational-Models
sahilg06/Awesome-Aesthetics-Assessment
Collection of Aesthetics Assessment Papers for Graphic Designs and Images.
beabetterdevv/S3PresignedUpload