YaoZhang93's Stars
open-webui/open-webui
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
NeoVertex1/SuperPrompt
SuperPrompt is an attempt to engineer prompts that might help us understand AI agents.
OpenBMB/ToolBench
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
moshi4/pyCirclize
Circular visualization in Python (Circos Plot, Chord Diagram, Radar Chart)
JindongGu/Awesome-Prompting-on-Vision-Language-Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
BAAI-DCAI/M3D
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models
ibrahimethemhamamci/CT-CLIP
Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography
Beckschen/ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
MedAIerHHL/CVPR-MIA
Papers of Medical Image Analysis on CVPR
PhoenixZ810/MG-LLaVA
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
speedinghzl/AlignSeg
AlignSeg: Feature-Aligned Segmentation Networks (TPAMI 2021)
Project-MONAI/MetricsReloaded
zhaoziheng/SAT-DS
The official repository to build SAT-DS, a medical data collection of 72 public segmentation datasets, contains over 22K 3D images, 302K segmentation masks and 497 classes from 3 different modalities (MRI, CT, PET) and 8 human body regions.
xmed-lab/TriALS
MICCAI 2024: nnUNet incorporating additional baselines as SAMed️, Mamba Variants, and MedNeXT to establish a benchmark for segmentation challenges.
OliverRensu/MVG
MrGiovanni/Pixel2Cancer
[MICCAI 2024] Cellular Automata for Tumor Development - Realistic Synthetic Tumors in Liver, Pancreas, and Kidney
MAGIC-AI4Med/KEP
[ECCV 2024 Oral] Knowledge-enhanced pretraining for computational pathology
opendatalab/image-downloader
zz-haooo/LLMs-Preference-Optimization