llearner

llearner's Stars

PaddlePaddle/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Language:Python43k7.7k
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Language:Python4.9k397
kyutai-labs/moshi
Language:Python6k447
liyunlongaaa/NSD-MS2S
CHIME-7/8 diarization champion system: neural speaker diarization using memory-aware multi-speaker embedding with sequence-to-sequence architecture
Language:Shell624
AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Language:Python36019
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python2.7k254
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language:Python2.1k127
mlfoundations/MINT-1T
MINT-1T: A one trillion token multimodal interleaved dataset.
74220
FuxiaoLiu/MMC
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
Language:Python793
appl-team/appl
🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.
Language:Python68
SpursGoZmy/Table-LLaVA
Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train Dataset for table understanding and develop a generalist tabular MLLM named Table-LLaVA.
Language:Python1447
weavel-ai/Ape
Your first AI prompt engineer
Language:Python32414
zehanwang01/OmniBind
Language:Python231
VikParuchuri/marker
Convert PDF to markdown quickly with high accuracy
Language:Python16.8k955
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
Language:Python1.2k83
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Language:Python80841
OFA-Sys/AIR-Bench
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Language:Python381
Jack-ZC8/M3AV-dataset
A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset (ACL 2024)
Language:Python121
RenShuhuai-Andy/TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Language:Python27424
PhoebusSi/Alpaca-CoT
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台，我们欢迎开源爱好者发起任何有意义的pr！
Language:Jupyter Notebook2.6k245
SpursGoZmy/Tabular-LLM
本项目旨在收集开源的表格智能任务数据集（比如表格问答、表格-文本生成等），将原始数据整理为指令微调格式的数据并微调LLM，进而增强LLM对于表格数据的理解，最终构建出专门面向表格智能任务的大型语言模型。
42834
stanfordnlp/dspy
DSPy: The framework for programming—not prompting—foundation models
Language:Python17.5k1.3k
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Language:Python1.1k67
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Language:Python5.9k407
IVGSZ/Flash-VStream
This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"
Language:Python1107
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Language:C++7.9k407
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python5.2k539
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python2.8k267
langgptai/Awesome-Multimodal-Prompts
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
21617
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python4.3k390