Chuanqi-Zang's Stars
cruiseresearchgroup/MAPLE
jianchang512/pyvideotrans
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
PaddlePaddle/PaddleSeg
Easy-to-use image segmentation library with awesome pre-trained model zoo, supporting wide-range of practical tasks in Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, 3D Segmentation, etc.
ultralytics/ultralytics
Ultralytics YOLO11 🚀
THU-MIG/yolov10
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
CASIA-IVA-Lab/FastSAM
Fast Segment Anything
liliu-avril/Awesome-Segment-Anything
This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
SY-Xuan/Pink
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
microsoft/LLaVA-Med
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
tencent-ailab/IP-Adapter
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
JunjieHu/ReCo-RL
Codes of AAAI 2020 paper "What Makes A Good Story? Designing Composite Rewards for Visual Storytelling"
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
eric-ai-lab/MiniGPT-5
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
eric-xw/AREL
Code for the ACL paper "No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling"
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
facebookresearch/dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
skypilot-org/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
wilson1yan/VideoGPT
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
kohjingyu/fromage
🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
BIMK/PlatEMO
Evolutionary multi-objective optimization platform
BIT-thesis/LaTeX-template
LaTeX template for BIT thesis
BITNP/BIThesis
📖 北京理工大学非官方 LaTeX 模板集合,包含本科、研究生毕业设计模板及更多。🎉 (更多文档请访问 wiki 和 release 中的手册)
lllyasviel/ControlNet
Let us control diffusion models!