zealian's Stars
zhipeixu/FakeShield
The official implementation of 'FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models'
recognito-vision/Linux-FaceRecognition-FaceLivenessDetection
NIST_FRVT Top 1🏆 Face Recognition, Liveness Detection(Face Anti-Spoof), Face Attribute Analysis Linux Server SDK Demo ☑️ Face Recognition ☑️ Face Matching ☑️ Face Liveness Detection ☑️ Face Identification (1:N Face Search) ☑️ Face Attribute Analysis
qcf-568/DocTamper
[CVPR2023] Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution
Pengfei8324/chinese_license_plate_generator
**车牌生成器
LarryJiang134/Image_manipulation_detection
Paper: CVPR2018, Learning Rich Features for Image Manipulation Detection
WaLittleMoon/Learning-Rich-Features-for-Image-Manipulation-Detection
基于双流 Faster R-CNN 网络的 图像篡改检测
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
deeplearningshare/multi-line-plate-recognition
Multi-line license plate recognition
WenjinW/LATIN-Prompt
kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
OpenGVLab/VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
lucadiliello/transformers-framework
SOTA training framework based on PyTorch Lightning and Transformers
kyutai-labs/moshi
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
OpenMOSS/AnyGPT
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
lucidrains/audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
AudioLLMs/Awesome-Audio-Large-Language-Models
Audio Large Language Models
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
gpt-omni/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
westlake-baichuan-mllm/bc-omni
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
cvg/Hierarchical-Localization
Visual localization made easy with hloc
KevinMusgrave/pytorch-metric-learning
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
naver/fire
sungonce/CVNet
Official PyTorch Implementation of Correlation Verification for Image Retrieval, CVPR 2022 (Oral Presentation)