yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

RustApache-2.0

Issues

Support for Extracting PDF Content as XML
#35 opened a month ago by coroluca
7
markdown support
#37 opened a month ago by peterlyz
2
use it in multiple processes.
#34 opened 18 days ago by ljhssga
1
Installation not working - WIndows 11/Python3.10
#31 opened 18 days ago by IneffableBunch
2
Improve extract to stream performance
#5 opened 4 months ago by nmammeri
0
Failed extraction - Class CTTextCharacterProperties is missing.
#32 opened 2 months ago by s4zuk3
6
Failed Extraction - cmap font missing
#33 opened 19 days ago by s4zuk3
2
Bug Report: Text Truncation in EPUB Files Larger Than 500KB
#39 opened 19 days ago by hochenggang
6
Max length not setting
#41 opened 20 days ago by jabberjabberjabber
1
When a document(.doc) contains a Visio graphic, the extraction fails
#40 opened a month ago by nilcodes
1
Add detect file type API
#4 opened 4 months ago by nmammeri
1
Change in PDF Extraction Results
#30 opened 2 months ago by TheTechromancer
3
make reflection data platform specific
#25 opened 2 months ago by nmammeri
0
Return Metadata with extraction result
#3 opened 2 months ago by nmammeri
0
Stall when extracting using ocr on macos from pdf with embedded images
#23 opened 2 months ago by nmammeri
1
Implement extracting from an array of bytes
#7 opened 2 months ago by nmammeri
0
ocr examples and docs
#18 opened 2 months ago by nmammeri
0
TypeError: ParseError("Parse error occurred : TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@281b1a01")
#16 opened 2 months ago by NourEldin-Osama
0
failed to install in windows 11
#13 opened 2 months ago by NourEldin-Osama
9
Add Microsoft Windows support
#1 opened 2 months ago by nmammeri
0
make the build script faster
#9 opened 3 months ago by nmammeri
0
PyPI package is huge
#10 opened 3 months ago by chrisgoddard
2
Extracting text from a specific page of the document
#6 opened 3 months ago by bm777
4
Add tests with different file formats
#2 opened 3 months ago by nmammeri
0