Giruvegan's Stars
KLUE-benchmark/KLUE
📖 Korean NLU Benchmark
AnyLoc/AnyLoc
AnyLoc: Universal Visual Place Recognition (RA-L 2023)
cvg/LightGlue
LightGlue: Local Feature Matching at Light Speed (ICCV 2023)
RuojinCai/doppelgangers
Doppelgangers: Learning to Disambiguate Images of Similar Structures
chicleee/Image-Matching-Paper-List
A personal list of papers and resources of image matching and pose estimation, including perspective images and panoramas.
google-research/omniglue
Code release for CVPR'24 submission 'OmniGlue'
verlab/accelerated_features
Implementation of XFeat (CVPR 2024). Do you need robust and fast local feature extraction? You are in the right place!
McGill-NLP/llm2vec
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
BM-K/Sentence-Embedding-Is-All-You-Need
Korean Sentence Embedding Repository
sail-sg/metaformer
MetaFormer Baselines for Vision (TPAMI 2024)
Parskatt/RoMa
[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.
embeddings-benchmark/mteb
MTEB: Massive Text Embedding Benchmark
open-webui/open-webui
User-friendly WebUI for LLMs (Formerly Ollama WebUI)
NVIDIA/NeMo-Guardrails
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
kyegomez/RT-2
Democratization of RT-2 "RT-2: New model translates vision and language into action"
mit-han-lab/efficientvit
EfficientViT is a new family of vision models for efficient high-resolution vision.
SKTBrain/KVQA
Korean Visual Question Answering
bytedance/MTVQA
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine widely-used yet low-resource languages.
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
mapluisch/LLaVA-CLI-with-multiple-images
LLaVA inference with multiple images at once for cross-image analysis.
khanrc/honeybee
Official implementation of project Honeybee (CVPR 2024)
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
naver/deep-image-retrieval
End-to-end learning of deep visual representations for image retrieval
LLaVA-VL/LLaVA-NeXT
huggingface/pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
philschmid/optimum-transformers-optimizations
NomaDamas/awesome-korean-llm
Awesome list of Korean Large Language Models.