Pinned Repositories
Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
EVA
EVA Series: Visual Representation Fantasies from BAAI
XPretrain
Multi-modality pre-training
ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 100+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
PaddleDetection
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
hardlipay's Repositories
hardlipay doesn’t have any repository yet.