xiyanghu's Stars
abi/screenshot-to-code
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
microsoft/autogen
A programming framework for agentic AI 🤖
lllyasviel/ControlNet
Let us control diffusion models!
open-mmlab/mmpose
OpenMMLab Pose Estimation Toolbox and Benchmark.
facebookresearch/sapiens
High-resolution models for human tasks.
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
IDEA-Research/DWPose
"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)
PowerHouseMan/ComfyUI-AdvancedLivePortrait
THUDM/ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
tgxs002/HPSv2
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
facebookresearch/MovieGenBench
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
zqhang/AnomalyCLIP
Official implementation for AnomalyCLIP (ICLR 2024)
cascremers/pdfdiff
Command-line tool to inspect the difference between (the text in) two PDF files
zhangfaen/finetune-Qwen2-VL
mihirp1998/VADER
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.
TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)
apple/ml-slowfast-llava
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
caoyunkang/AdaCLIP
[ECCV2024] The Official Implementation for ''AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection''
Yushi-Hu/VisualSketchpad
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Kwai-Kolors/MPS
Yu-Fangxu/COLD-Attack
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
aounon/llm-rank-optimizer
zwq2018/Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Hritikbansal/videophy
Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics
lapisrocks/rpo
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
Ahren09/AgentReview
Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."
yongchao98/PROMST
Automatic prompt optimization framework for multi-step agent tasks.
duykhuongnguyen/LASeR-MAB
Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"