Pinned Repositories
community
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
pipeline-paddleocr
Pipeline for converting PDFs to raw text with PaddleOCR
pipeline-sec-filings
Preprocessing pipeline notebooks and API supporting text extraction from SEC documents
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
unstructured-api
unstructured-api-tools
unstructured-inference
unstructured-js-client
A Typescript client for the Unstructured hosted API
unstructured-python-client
A Python client for the Unstructured hosted API
unstructured.PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Unstructured's Repositories
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Unstructured-IO/unstructured-api
Unstructured-IO/unstructured-inference
Unstructured-IO/pipeline-sec-filings
Preprocessing pipeline notebooks and API supporting text extraction from SEC documents
Unstructured-IO/unstructured-python-client
A Python client for the Unstructured hosted API
Unstructured-IO/unstructured-js-client
A Typescript client for the Unstructured hosted API
Unstructured-IO/unstructured.PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Unstructured-IO/unstructured-api-tools
Unstructured-IO/pipeline-paddleocr
Pipeline for converting PDFs to raw text with PaddleOCR
Unstructured-IO/unstructured-ingest
Unstructured-IO/irs-manual-demo
Unstructured-IO/danswer
Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
Unstructured-IO/langchain
⚡ Building applications with LLMs through composability ⚡
Unstructured-IO/pipeline-oer
Pipeline for extraction information from Army OERs
Unstructured-IO/pipeline-template
Unstructured-IO/docs
Documentation for all Unstructured products and libraries
Unstructured-IO/pipeline-invoices
Unstructured-IO/base-images
Store Dockerfiles and Packer configs for images to use as a base to build upon
Unstructured-IO/unstructured-platform-plugins
Unstructured-IO/unstructured.pytesseract
A Python wrapper for Google Tesseract
Unstructured-IO/pipeline-receipts
Preprocessing pipeline notebooks and API supporting text extraction from receipts images
Unstructured-IO/terraform-aws-ecs-web-app
Terraform module that implements a web app on ECS and supports autoscaling, CI/CD, monitoring, ALB integration, and much more.
Unstructured-IO/unstructured.Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Unstructured-IO/pipeline-document-layout
Pipeline for layout extraction
Unstructured-IO/.github
Unstructured-IO/js-client-batch
JS Client Batch Processing
Unstructured-IO/model-cards
FedRAMP formatted model cards
Unstructured-IO/pairing-technical-challenge
Pairing Technical Challenge
Unstructured-IO/terraform-aws-ecs-alb-service-task
Terraform module which implements an ECS service which exposes a web service via ALB.
Unstructured-IO/wolfi-dev-os
Main package repository for production Wolfi images