kaka-Cao

Pinned Repositories

E2E-MFD
E2E-MFD-OOD
Language:Jupyter Notebook62 1 244
E2E-MFD-HOD
E2E-MFD-HOD
Language:Python13 2 31
SuperYOLO
SuperYOLO is accepted by TGRS
Language:Python363 2 14060
E2E-MFD
E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection
Language:Python10
SuperYOLO
SuperYOLO is accepted by TGRS
Language:Python10
ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
Language:Python5k 23 1.5k432
Vision-LLM-Alignment
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
Language:Python91 3 96
RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Language:Python255 2 297
CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
Language:Python2.2k 29 176147
VL-RLHF
A RLHF Infrastructure for Vision-Language Models
Language:Python144 4 177

kaka-Cao's Repositories

kaka-Cao/E2E-MFD
E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection
Language:Python10
kaka-Cao/SuperYOLO
SuperYOLO is accepted by TGRS
Language:Python10