MhLiao's Stars
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
chatchat-space/Langchain-Chatchat
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
meta-llama/codellama
Inference code for CodeLlama models
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
cvat-ai/cvat
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
e2b-dev/awesome-ai-agents
A list of AI autonomous agents
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
magic-research/magic-animate
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
mikel-brostrom/boxmot
BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
luban-agi/Awesome-AIGC-Tutorials
Curated tutorials and resources for Large Language Models, AI Painting, and more.
openvinotoolkit/anomalib
An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
lyuwenyu/RT-DETR
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
CVCUDA/CV-CUDA
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
facebookresearch/ConvNeXt-V2
Code release for ConvNeXt V2 model
keyu-tian/SparK
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
mlfoundations/MINT-1T
MINT-1T: A one trillion token multimodal interleaved dataset.
shikras/shikra
debidatta/syndata-generation
Code used to generate synthetic scenes and bounding box annotations for object detection. This was used to generate data used in the Cut, Paste and Learn paper
OpenGVLab/OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
buptlihang/CDLA
CDLA: A Chinese document layout analysis (CDLA) dataset
mindspore-lab/mindocr
A toolbox of ocr models and algorithms based on MindSpore
YanjingLi0202/Q-ViT
The official implementation of the NeurIPS 2022 paper Q-ViT.
OpenGVLab/GUI-Odyssey
GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
yuyq96/TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models