ARSblithe212

BUPT

ARSblithe212's Stars

allenai/pixmo-docs
Synthetic data generation pipelines for Pixmo-docs.
Language:Python163
pdufour/Native-LLM-for-Android
Demonstration of running a native LLM on Android device.
Language:Python1
arcee-ai/DistillKit
An Open Source Toolkit For LLM Distillation
Language:Python38242
shufangxun/LLaVA-MoD
Making LLaVA Tiny via MoE-Knowledge Distillation
Language:Python724
WenmuZhou/TableGeneration
通过浏览器渲染生成表格图像
Language:Python20742
SpursGoZmy/Tabular-LLM
本项目旨在收集开源的表格智能任务数据集（比如表格问答、表格-文本生成等），将原始数据整理为指令微调格式的数据并微调LLM，进而增强LLM对于表格数据的理解，最终构建出专门面向表格智能任务的大型语言模型。
50238
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 100+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
Language:Python4.8k417
explodinggradients/ragas
Supercharge Your LLM Application Evaluations 🚀
Language:Python7.7k781
conjuncts/gmft
Lightweight, performant, deep table extraction
Language:Python36426
camelot-dev/camelot
A Python library to extract tabular data from PDFs
Language:Python3.1k477
MaxKinny/TabRecSet
A large scale camera-taken table detection and recognition dataset.
Language:Python1168
unslothai/unsloth
Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
Language:Python19.7k1.4k
DS3Lab/WordScape
The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.
Language:Python335
twang2218/vocab-coverage
语言模型中文认知能力分析
Language:Python23624
lyogavin/airllm
AirLLM 70B inference with single 4GB GPU
Language:Jupyter Notebook5.5k437
NirDiamant/RAG_Techniques
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contextually rich responses.
Language:Jupyter Notebook9.3k957
microsoft/Phi-3CookBook
This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open sourced AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.
Language:Jupyter Notebook2.6k284
houking-can/CCKS2019-Task5
CCKS2019评测任务五-公众公司公告信息抽取，第3名
Language:Python12226
arcral/CCKS2019-Task5
CCKS2019评测任务五-公众公司公告信息抽取，第3名
11
merveenoyan/smol-vision
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
Language:Jupyter Notebook1k94
davanstrien/awesome-synthetic-datasets
awesome synthetic (text) datasets
Language:Jupyter Notebook25011
confident-ai/deepeval
The LLM Evaluation Framework
Language:Python4.1k330
AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Language:C++1.6k182
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Language:Python6.2k397
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。
Language:Python21.9k1.6k
peakhell/OCRIntegrator
OCRFusion is an integrated solution that combines multiple open-source OCR (Optical Character Recognition) models, layout analysis, and table parsing capabilities. This project unifies these functionalities into a single interface, providing a streamlined and efficient way to process and extract information from various types of documents.
Language:Python132
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
Language:Python9.1k576
VikParuchuri/marker
Convert PDF to markdown + JSON quickly with high accuracy
Language:Python18.8k1.1k
ocrmypdf/OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Language:Python14.4k1k
adithya-s-k/omniparse
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Language:Python5.9k475

ARSblithe212

ARSblithe212's Stars

allenai/pixmo-docs

pdufour/Native-LLM-for-Android

arcee-ai/DistillKit

shufangxun/LLaVA-MoD

WenmuZhou/TableGeneration

SpursGoZmy/Tabular-LLM

modelscope/ms-swift

explodinggradients/ragas

conjuncts/gmft

camelot-dev/camelot

MaxKinny/TabRecSet

unslothai/unsloth

DS3Lab/WordScape

twang2218/vocab-coverage

lyogavin/airllm

NirDiamant/RAG_Techniques

microsoft/Phi-3CookBook

houking-can/CCKS2019-Task5

arcral/CCKS2019-Task5

merveenoyan/smol-vision

davanstrien/awesome-synthetic-datasets

confident-ai/deepeval

AlibabaResearch/AdvancedLiterateMachinery

opendatalab/PDF-Extract-Kit

opendatalab/MinerU

peakhell/OCRIntegrator

facebookresearch/nougat

VikParuchuri/marker

ocrmypdf/OCRmyPDF

adithya-s-k/omniparse