YaoZhang93

Shanghai AI Lab

YaoZhang93's Stars

open-webui/open-webui
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Language:JavaScript56.8k 284 3.3k7k
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。
Language:Python24.5k 125 8661.8k
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook13.6k 79 4311.3k
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python13.1k 107 617914
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python6.8k 58 740524
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Language:Python6.6k 55 223573
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Language:Python6.4k 44 151423
NeoVertex1/SuperPrompt
SuperPrompt is an attempt to engineer prompts that might help us understand AI agents.
5.7k 78 21535
OpenBMB/ToolBench
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
Language:Python4.8k 49 306431
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Language:Python3.8k 31 264348
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python3.1k 100 124295
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Language:Python2k 33 125119
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Language:Python1.9k 26 51112
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Language:Python1.1k 15 4749
moshi4/pyCirclize
Circular visualization in Python (Circos Plot, Chord Diagram, Radar Chart)
Language:Python802 10 4650
JindongGu/Awesome-Prompting-on-Vision-Language-Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
419 5 032
BAAI-DCAI/M3D
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models
Language:Python234 6 2413
ibrahimethemhamamci/CT-CLIP
Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography
Language:Python223 2 2922
Beckschen/ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
Language:Python196 6 116
MedAIerHHL/CVPR-MIA
Papers of Medical Image Analysis on CVPR
161 2 16
PhoenixZ810/MG-LLaVA
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
Language:Python152 1 104
speedinghzl/AlignSeg
AlignSeg: Feature-Aligned Segmentation Networks (TPAMI 2021)
Language:Python128 7 97
Project-MONAI/MetricsReloaded
Language:Python81 6 239
zhaoziheng/SAT-DS
The official repository to build SAT-DS, a medical data collection of 72 public segmentation datasets, contains over 22K 3D images, 302K segmentation masks and 497 classes from 3 different modalities (MRI, CT, PET) and 8 human body regions.
Language:Python78 3 31
xmed-lab/TriALS
MICCAI 2024: nnUNet incorporating additional baselines as SAMed️, Mamba Variants, and MedNeXT to establish a benchmark for segmentation challenges.
Language:Python69 2 25
OliverRensu/MVG
Language:Python48 2 25
MrGiovanni/Pixel2Cancer
[MICCAI 2024] Cellular Automata for Tumor Development - Realistic Synthetic Tumors in Liver, Pancreas, and Kidney
Language:Jupyter Notebook38 1 31
MAGIC-AI4Med/KEP
[ECCV 2024 Oral] Knowledge-enhanced pretraining for computational pathology
Language:Python29 1 51
opendatalab/image-downloader
Language:Python26 4 23
zz-haooo/LLMs-Preference-Optimization
11 1 00

YaoZhang93

YaoZhang93's Stars

open-webui/open-webui

opendatalab/MinerU

facebookresearch/sam2

OpenBMB/MiniCPM-V

OpenGVLab/InternVL

Ucas-HaoranWei/GOT-OCR2.0

opendatalab/PDF-Extract-Kit

NeoVertex1/SuperPrompt

OpenBMB/ToolBench

rom1504/img2dataset

gpt-omni/mini-omni

X-PLUG/mPLUG-DocOwl

facebookresearch/chameleon

showlab/Show-o

moshi4/pyCirclize

JindongGu/Awesome-Prompting-on-Vision-Language-Model

BAAI-DCAI/M3D

ibrahimethemhamamci/CT-CLIP

Beckschen/ViTamin

MedAIerHHL/CVPR-MIA

PhoenixZ810/MG-LLaVA

speedinghzl/AlignSeg

Project-MONAI/MetricsReloaded

zhaoziheng/SAT-DS

xmed-lab/TriALS

OliverRensu/MVG

MrGiovanni/Pixel2Cancer

MAGIC-AI4Med/KEP

opendatalab/image-downloader

zz-haooo/LLMs-Preference-Optimization