EinesTages's Stars
kijai/ComfyUI-Florence2
Inference Microsoft Florence2 VLM
vivo-ai-lab/BlueLM
BlueLM(蓝心大模型): Open large language models developed by vivo AI Lab
showlab/Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
opendilab/awesome-ui-agents
A curated list of of awesome UI agents resources, encompassing Web, App, OS, and beyond (continually updated)
e2b-dev/awesome-ai-agents
A list of AI autonomous agents
niuzaisheng/ScreenAgent
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
yfzhang114/SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
kyegomez/ViTAR
Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch
lucidrains/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
ParadoxZW/LLaVA-UHD-Better
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
thunlp/LLaVA-UHD
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Zefan-Cai/KVCache-Factory
Unified KV Cache Compression Methods for Auto-Regressive Models
harvardnlp/im2markup
Neural model for converting Image-to-Markup (by Yuntian Deng yuntiandeng.com)
zjwang21/StrokeNet
The official code for our EMNLP 2022 long paper [Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling]
datawhalechina/self-llm
《开源大模型食用指南》基于Linux环境快速部署开源大模型,更适合**宝宝的部署教程
THU-MIG/yolov10
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
Yuliang-Liu/Monkey
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
pliang279/MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
kwuking/TimeMixer
[ICLR 2024] Official implementation of "TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting"
Tencent/Tencent-Hunyuan-Large
deepseek-ai/DreamCraft3D
[ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
antgroup/agentUniverse
agentUniverse is a LLM multi-agent framework that allows developers to easily build multi-agent applications.
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
Go2Heart/EchoSight
[EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.
illuin-tech/colpali
The code used to train and run inference with the ColPali architecture.
riedlerm/multimodal_rag_for_industry
Implementation and evaluation of multimodal RAG with text and image inputs for industrial applications
pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
OpenBMB/VisRAG
Parsing-free RAG supported by VLMs