maocaixia's Stars
infinigence/Infini-Megrez
ppaanngggg/layoutreader
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
stacklens/django_blog_tutorial
Django搭建博客教程
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
SocialAI-tianji/Tianji
制作懂人情世故的大语言模型 | 涵盖提示词工程、RAG、Agent、LLM微调教程
rasbt/LLMs-from-scratch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
WenmuZhou/TableGeneration
通过浏览器渲染生成表格图像
hacksider/Deep-Live-Cam
real time face swap and one-click video deepfake with only a single image
Yuliang-Liu/Monkey
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
danny-avila/LibreChat
Enhanced ChatGPT Clone: Features Agents, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting. Active project.
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
cv-small-snails/Awesome-Table-Recognition
A curated list of resources dedicated to table recognition
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
CosmosShadow/gptpdf
Using GPT to parse PDF
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
modelscope/DiffSynth-Studio
Enjoy the magic of Diffusion models!
NielsRogge/Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
JiaquanYe/TableMASTER-mmocr
2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.
microsoft/table-transformer
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
X-PLUG/MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
VikParuchuri/marker
Convert PDF to markdown + JSON quickly with high accuracy
3DTopia/LGM
[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
PawanOsman/ChatGPT
OpenAI API Free Reverse Proxy
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
plandex-ai/plandex
AI driven development in your terminal. Designed for large, real-world tasks.
nashsu/FreeAskInternet
FreeAskInternet is a completely free, PRIVATE and LOCALLY running search aggregator & answer generate using MULTI LLMs, without GPU needed. The user can ask a question and the system will make a multi engine search and combine the search result to LLM and generate the answer based on search results. It's all FREE to use.
midday-ai/midday
Invoicing, Time tracking, File reconciliation, Storage, Financial Overview & your own Assistant made for Freelancers