echo840's Stars
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
KindXiaoming/pykan
Kolmogorov Arnold Networks
jessevig/bertviz
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
hila-chefer/Transformer-Explainability
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
lxtGH/OMG-Seg
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
onnx/onnxmltools
ONNXMLTools enables conversion of models to ONNX
vivo-ai-lab/BlueLM
BlueLM(蓝心大模型): Open large language models developed by vivo AI Lab
hila-chefer/Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
AntonioTepsich/Convolutional-KANs
This project extends the idea of the innovative architecture of Kolmogorov-Arnold Networks (KAN) to the Convolutional Layers, changing the classic linear transformation of the convolution to learnable non linear activations in each pixel.
NVlabs/RADIO
Official repository for "AM-RADIO: Reduce All Domains Into One"
WXinlong/DenseCL
Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021 Oral.
luogen1996/LaVIN
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
shenyunhang/APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
microsoft/rho
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
OpenGVLab/OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
1ssb/torchkan
An easy to use PyTorch implementation of the Kolmogorov Arnold Network and a few novel variations
zh460045050/V2L-Tokenizer
HKUST-LongGroup/Awesome-Open-Vocabulary-Detection-and-Segmentation
Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
cheng-haha/KANs
🕹️The toy examples of Kolmogorov-Arnold Network (Get Started Quickly)
IntelLabs/multimodal_cognitive_ai
research work on multimodal cognitive ai
amazon-science/QA-ViT
foundation-multimodal-models/CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
naver-ai/cream
Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023
Ikomia-dev/onnx-donut
Export Donut model to onnx and run it with onnxruntime
Lackel/AGLA
Code for paper "AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention"
ShuoZhang2003/DT-VQA
xinke-wang/EST-VQA
[CVPR2020] EST-VQA Dataset
leeguandong/EcommerceOCRBench
电商文字识别的多模态大模型的ocr基准测试集,参照ocrbench,但是测评数据更多。