yehengchen's Stars
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
chenfei-wu/TaskMatrix
VikParuchuri/marker
Convert PDF to markdown + JSON quickly with high accuracy
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Mooler0410/LLMsPracticalGuide
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
geekyutao/Inpaint-Anything
Inpaint anything using Segment Anything and inpainting models.
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
OpenBMB/ToolBench
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
landing-ai/vision-agent
Vision agent
modelscope/modelscope-agent
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
ttengwang/Caption-Anything
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
textstat/textstat
:memo: python package to calculate readability statistics of a text object - paragraphs, sentences, articles.
jf-tech/omniparser
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
mbzuai-oryx/GeoChat
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
ViTAE-Transformer/Remote-Sensing-RVSA
The official repo for [TGRS'22] "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model"
SalesforceAIResearch/xLAM
ChenDelong1999/RemoteCLIP
🛰️ Official repository of paper "RemoteCLIP: A Vision Language Foundation Model for Remote Sensing" (IEEE TGRS)
orfeotoolbox/OTB
Github mirror of https://gitlab.orfeo-toolbox.org/orfeotoolbox/otb
hustvl/Senna
Bridging Large Vision-Language Models and End-to-End Autonomous Driving
HaonanGuo/Remote-Sensing-ChatGPT
Chat with RS-ChatGPT and get the remote sensing interpretation results and the response!
ZhanYang-nwpu/Awesome-Remote-Sensing-Multimodal-Large-Language-Model
Multimodal Large Language Models for Remote Sensing (RS-MLLMs): A Survey
nv-nguyen/gigapose
[CVPR 2024] PyTorch implementation of GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence
NJU-LHRS/LHRS-Bot
VGI-Enhanced multimodal large language model for remote sensing images.
Chen-Yang-Liu/Change-Agent
Official PyTorch implementation of ''Change-Agent: Toward Interactive Comprehensive Remote Sensing Change Interpretation and Analysis"
ermongroup/TEOChat
Official code for TEOChat, the first vision-language assistant for temporal earth observation data (ICLR 2025).
LinWeizheDragon/FLMR
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.