zealian

zealian's Stars

zhipeixu/FakeShield
The official implementation of 'FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models'
10911
recognito-vision/Linux-FaceRecognition-FaceLivenessDetection
NIST_FRVT Top 1🏆 Face Recognition, Liveness Detection(Face Anti-Spoof), Face Attribute Analysis Linux Server SDK Demo ☑️ Face Recognition ☑️ Face Matching ☑️ Face Liveness Detection ☑️ Face Identification (1:N Face Search) ☑️ Face Attribute Analysis
Language:C765291
qcf-568/DocTamper
[CVPR2023] Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution
Language:Python13811
Pengfei8324/chinese_license_plate_generator
**车牌生成器
Language:Python14137
LarryJiang134/Image_manipulation_detection
Paper: CVPR2018, Learning Rich Features for Image Manipulation Detection
Language:Jupyter Notebook36599
WaLittleMoon/Learning-Rich-Features-for-Image-Manipulation-Detection
基于双流 Faster R-CNN 网络的图像篡改检测
Language:Jupyter Notebook13327
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。
Language:Python22.6k1.6k
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Language:Python6.4k564
deeplearningshare/multi-line-plate-recognition
Multi-line license plate recognition
Language:Python7520
WenjinW/LATIN-Prompt
Language:Python504
kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Language:Python1433
OpenGVLab/VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
Language:Python87162
lucadiliello/transformers-framework
SOTA training framework based on PyTorch Lightning and Transformers
Language:Python6
kyutai-labs/moshi
Language:Python7.1k551
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
79748
OpenMOSS/AnyGPT
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
Language:Python80465
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Language:Python10.1k867
lucidrains/audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Language:Python2.5k266
AudioLLMs/Awesome-Audio-Large-Language-Models
Audio Large Language Models
20312
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Language:Python1.1k66
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Language:Python74556
gpt-omni/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Language:Python1.7k203
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
Language:Python10.9k2.4k
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Language:Python3.4k342
westlake-baichuan-mllm/bc-omni
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊
2547
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language:Python2.7k185
cvg/Hierarchical-Localization
Visual localization made easy with hloc
Language:Python3.3k610
KevinMusgrave/pytorch-metric-learning
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
Language:Python6.1k656
naver/fire
Language:Python1358
sungonce/CVNet
Official PyTorch Implementation of Correlation Verification for Image Retrieval, CVPR 2022 (Oral Presentation)
Language:Python17911