lazyc81's Stars
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
VikParuchuri/marker
Convert PDF to markdown + JSON quickly with high accuracy
neuml/txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
01-ai/Yi
A series of large language models trained from scratch by developers @01-ai
QwenLM/Qwen2
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Wandmalfarbe/pandoc-latex-template
A pandoc LaTeX template to convert markdown files to PDF or LaTeX.
axa-group/Parsr
Transforms PDF, Documents and Images into Enriched Structured Data
QwenLM/Qwen-Agent
Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
X-PLUG/MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
deepdoctection/deepdoctection
A Repo For Document AI
Filimoa/open-parse
Improved file parsing for LLM’s
breezedeus/Pix2Text
An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
nlmatics/llmsherpa
Developer APIs to Accelerate LLM Projects
blaisewang/img2latex-mathpix
Mathpix has changed their billing policy and no longer has free monthly API requests. This repo is now archived and will not receive any updates for the foreseeable future.
jackaduma/awesome_LLMs_interview_notes
LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案
nlmatics/nlm-ingestor
This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.
allenai/papermage
library supporting NLP and CV research on scientific papers
HazyResearch/pdftotree
:evergreen_tree: A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.
facebookresearch/mega
Sequence modeling with Mega.
HazyResearch/based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
corl-team/rebased
Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"
arXiv/zzzArchived_arxiv-readability
Pilot project to render HTML5 from arXiv LaTeX sources
jlaurens/synctex
Synchronization for TeX
deep-spin/infinite-former
dginev/ar5ivist
A turnkey command for converting a LaTeX source to ar5iv-style HTML
Shark-NLP/CAB