zjshan

zjshan's Stars

google-research/vision_transformer
Language:Jupyter Notebook10.4k1.3k
Alibaba-MIIL/ImageNet21K
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
Language:Python73271
opendatalab/DocLayout-YOLO
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Language:Python44129
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Language:Python3.4k294
google-research/big_transfer
Official repository for the "Big Transfer (BiT): General Visual Representation Learning" paper.
Language:Python1.5k174
binary-husky/gpt_academic
为GPT/GLM等LLM大语言模型提供实用化交互接口，特别优化论文阅读/润色/写作体验，模块化设计，支持自定义快捷按钮&函数插件，支持Python和C++等项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
Language:Python65.6k8.1k
trigaten/The_Prompt_Report
Language:HTML29323
LAION-AI/aesthetic-predictor
A linear estimator on top of clip to predict the aesthetic quality of pictures
Language:Jupyter Notebook48020
LAION-AI/laion-datasets
Description and pointers of laion datasets
Language:HTML2349
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Language:Python34.1k4.2k
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
Language:Python4.2k369
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook47.6k5.6k
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
Language:Jupyter Notebook2.9k274
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
Language:Python7k537
GAIR-NLP/MathPile
[NeurlPS D&B 2024] Generative AI for Math: MathPile
Language:Python39320
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
Language:Python7.5k540
ZrrSkywalker/MAVIS
Mathematical Visual Instruction Tuning for Multi-modal Large Language Models
1091
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Language:Python6.7k681
PaddlePaddle/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Language:Python44.2k7.8k
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python6k462
jcjohnson/densecap
Dense image captioning in Torch
Language:Jupyter Notebook1.6k430
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Language:Python1.3k187
GoGoDuck912/Self-Correction-Human-Parsing
An out-of-box human parsing representation extractor.
Language:Jupyter Notebook1.1k236
NanmiCoder/MediaCrawler
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频｜评论爬虫、微博帖子｜评论爬虫、百度贴吧帖子｜百度贴吧评论回复爬虫 | 知乎问答文章｜评论爬虫
Language:Python17.7k5.5k
FuxiaoLiu/LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Language:Python25513
microsoft/JARVIS
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
Language:Python23.7k2k
Hzzone/pytorch-openpose
pytorch implementation of openpose including Hand and Body Pose Estimation.
Language:Jupyter Notebook2.1k400
Breakthrough/PySceneDetect
:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.
Language:Python3.3k398
allenai/mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
Language:Python90234
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with lmms-eval
Language:Python2k147