SLTK1's Stars
artidoro/qlora
QLoRA: Efficient Finetuning of Quantized LLMs
fh2019ustc/DocScanner
The official repo for “DocScanner: Robust Document Image Rectification with Progressive Learning”.
mbzuai-oryx/groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
khuangaf/Awesome-Chart-Understanding
A curated list of recent and past chart understanding work based on our survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.
LingyvKong/OneChart
[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"
lilanxiao/Rotated_IoU
Differentiable IoU of rotated bounding boxes using Pytorch
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
pytorch/torchtitan
A PyTorch native library for large model training
jun0wanan/awesome-large-multimodal-agents
RapidAI/RapidLayout
Analysis of Chinese and English layouts 中英文版面分析
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
puzzlepaint/camera_calibration
Accurate geometric camera calibration with generic camera models
fixstars/libSGM
Stereo Semi Global Matching by cuda
zer0int/CLIP-fine-tune
Fine-tuning code for CLIP models
nl8590687/ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
kakaobrain/coyo-dataset
COYO-700M: Large-scale Image-Text Pair Dataset
HCIILAB/SCUT-HCCDoc_Dataset_Release
ZZZHANG-jx/DocRes
[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
mindsdb/mindsdb
Platform for building AI that can learn and answer questions over federated data.
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
opendatalab/UniMERNet
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
tsattler/RansacLib
Template-based implementation of RANSAC and its variants in C++
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
PoseLib/PoseLib
Minimal solvers for calibrated camera pose estimation
laurentkneip/opengv
OpenGV is a collection of computer vision methods for solving geometric vision problems. It is hosted and maintained by the Mobile Perception Lab of ShanghaiTech.
ucaslcl/Fox
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
youngwanLEE/CenterMask
[CVPR 2020] CenterMask : Real-Time Anchor-Free Instance Segmentation
deepseek-ai/DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
toandaominh1997/EfficientDet.Pytorch
Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch
xuannianz/EfficientDet
EfficientDet (Scalable and Efficient Object Detection) implementation in Keras and Tensorflow