KululuMi's Stars
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
shap/shap
A game theoretic approach to explain the output of any machine learning model.
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
google-research-datasets/richhf-18k
RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with the file name of the associated labeled images (no urls or images are included in this dataset).
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
PKU-YuanGroup/MagicTime
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
mohuangrui/ucasthesis
LaTeX Thesis Template for the University of Chinese Academy of Sciences
Technion-Kishony-lab/data-to-paper
data-to-paper: Backward-traceable AI-driven scientific research
threestudio-project/threestudio
A unified framework for 3D content generation.
jun0wanan/awesome-large-multimodal-agents
lllyasviel/sd-forge-layerdiffuse
[WIP] Layer Diffusion for WebUI (via Forge)
Yutong-Zhou-cv/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
openai/weak-to-strong
InternLM/InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Harry24k/adversarial-attacks-pytorch
PyTorch implementation of adversarial attacks [torchattacks]
duchengbin8/Stable_Diffusion_is_Unstable
Official implement of paper: Stable Diffusion is Unstable
web-arena-x/webarena
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
j-min/DSG
Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)
dongyh20/Octopus
🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.
aiwaves-cn/agents
An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
tgxs002/HPSv2
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
j-min/DallEval
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)
tgxs002/align_sd
Better Aligning Text-to-Image Models with Human Preference. ICCV 2023
PVIT-official/PVIT
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
Wangt-CN/EqBen
[ICCV'23 Oral] The introduction and toolkit for EqBen Benchmark
microsoft/robustlearn
Robust machine learning for responsible AI
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
hyp1231/awesome-llm-powered-agent
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
showlab/Image2Paragraph
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.