xiaomei1995's Stars
PaddlePaddle/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
chatchat-space/Langchain-Chatchat
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
KurtBestor/Hitomi-Downloader
:cake: Desktop utility to download images/videos/music/text from various websites, and more.
ocrmypdf/OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
jsvine/pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
pengxiao-song/LaWGPT
🎉 Repo for LaWGPT, Chinese-Llama tuned with Chinese Legal knowledge. 基于中文法律知识的大语言模型
gaotianliuyun/gao
FongMi影视和tvbox配置文件,如果喜欢,请Fork自用。使用前请仔细阅读仓库说明,一旦使用将被视为你已了解。
649453932/Chinese-Text-Classification-Pytorch
中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention,DPCNN,Transformer,基于pytorch,开箱即用。
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
OpenBMB/ToolBench
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
indiff/qttabbar
QTTabBar is a small tool that allows you to use tab multi label function in Windows Explorer. https://www.yuque.com/indiff/qttabbar
kermitt2/grobid
A machine learning software for extracting information from scholarly documents
breezedeus/Pix2Text
An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
EleutherAI/the-pile
alibaba/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
doc-analysis/TableBank
TableBank: A Benchmark Dataset for Table Detection and Recognition
AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
mammothb/symspellpy
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
codefuse-ai/MFTCoder
High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs. This work has been accepted by KDD 2024.
moon-hotel/BertWithPretrained
An implementation of the BERT model and its related downstream tasks based on the PyTorch framework
SupritYoung/RLHF-Label-Tool
用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.
RapidAI/RapidStructure
版面分析 | 表格识别 | 文档方向分类
phamquiluan/PubLayNet
ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...
rockyzhengwu/document-ocr
一个相对完整的文档分析和识别项目
luckydog5/TabelDetection
Using deep-leaning detect tables in the documet image
JovenChu/vector_stores_test
基于milvus和faiss实现文本转向量并存储的流程及简单性能测试
RebaekSoparkKitchen/duidui
中文校对:自动识别并修正段落 / 标点符号 / 空格;广告禁用词检索,大段重复检测。