LeeyiMing's Stars
opendatalab/OmniDocBench
A Comprehensive Benchmark for Document Parsing and Evaluation
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
zhangfaen/finetune-Qwen2-VL
huggingface/trl
Train transformer language models with reinforcement learning.
DS4SD/docling
Get your documents ready for gen AI
InternLM/HuixiangDou
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
ad-freiburg/large-qa-datasets
A collection of large question answering datasets
thu-coai/Safety-Prompts
Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
InsaneLife/ChineseNLPCorpus
中文自然语言处理数据集,平时做做实验的材料。欢迎补充提交合并。
QwenLM/Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
deepseek-ai/DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
wanglin2/mind-map
一个还算强大的Web思维导图。A relatively powerful web mind map.
colesbury/nogil
Multithreaded Python without the GIL
Tlntin/Qwen-TensorRT-LLM
kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
google-research/deduplicate-text-datasets
apacha/OMR-Datasets
Collection of datasets used for Optical Music Recognition
mdeff/fma
FMA: A Dataset For Music Analysis
lonePatient/awesome-pretrained-chinese-nlp-models
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
microsoft/JARVIS
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
ZhuiyiTechnology/TableQA
NL2SQL competition dataset
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
jianzhnie/awesome-instruction-datasets
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
kaixindelele/ChatPaper
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
thunlp/THULAC-Python
An Efficient Lexical Analyzer for Chinese