LinMu7177

LinMu7177's Stars

songweige/rich-text-to-image
Rich-Text-to-Image Generation
Language:Python74561
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Language:Python19.1k2.4k
YuchenLiu98/COMM
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
1785
cmhungsteve/Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
4.4k482
microsoft/SoM
Set-of-Mark Prompting for LMMs
Language:Python1k80
mlfoundations/datacomp
DataComp: In search of the next generation of multimodal datasets
Language:Python59949
mlpc-ucsd/BLIVA
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
Language:Python24422
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
Language:Python1.6k81
jshilong/GPT4RoI
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Language:Python47725
facebookresearch/paco
This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts, and attributes prediction models, query evaluation scripts, and visualization notebooks.
Language:Python25912
linjieli222/VQA_ReGAT
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Language:Python17738
RachanaJayaram/Cross-Attention-VizWiz-VQA
A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset originates from images and questions compiled by members of the visually impaired community and as such, highlights some of the challenges presented by this particular use case.
Language:Python136
Cloud-CV/EvalAI
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
Language:Python1.7k773
OpenGVLab/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Language:Python40430
amazon-science/mm-cot
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
Language:Python3.7k313
henghuiding/ReLA
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Language:Python65416
k1rezaei/Text-to-concept
Language:Jupyter Notebook261
UX-Decoder/Semantic-SAM
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
Language:Python2.1k104
google-research/magvit
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
Language:Python89943
hoya012/semantic-segmentation-tutorial-pytorch
A simple PyTorch codebase for semantic segmentation using Cityscapes.
Language:Jupyter Notebook19439
daohu527/awesome-self-driving-car
An awesome list of self-driving cars
680166
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Language:Python5.6k590
nish03/FFS
Code for CVPR 2023 Highlight paper "Normalizing Flow based Feature Synthesis for Outlier-Aware Object Detection"
Language:Python235
rentainhe/TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Language:Python6318
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python18k1.9k
OptimalScale/DetGPT
Language:Jupyter Notebook73671
luca-medeiros/lang-segment-anything
SAM with text prompt
Language:Jupyter Notebook1.4k142
MenghaoGuo/Awesome-Vision-Attentions
Summary of related papers on visual attention. Related code will be released based on Jittor gradually.
Language:Python2.7k402
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language:Jupyter Notebook9.2k909
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Language:Jupyter Notebook14.1k1.3k