yobix-ai/extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
RustApache-2.0
Issues
- 7
Support for Extracting PDF Content as XML
#35 opened by coroluca - 2
markdown support
#37 opened by peterlyz - 1
use it in multiple processes.
#34 opened by ljhssga - 2
- 0
Improve extract to stream performance
#5 opened by nmammeri - 6
- 2
Failed Extraction - cmap font missing
#33 opened by s4zuk3 - 6
- 1
Max length not setting
#41 opened by jabberjabberjabber - 1
- 1
Add detect file type API
#4 opened by nmammeri - 3
Change in PDF Extraction Results
#30 opened by TheTechromancer - 0
make reflection data platform specific
#25 opened by nmammeri - 0
Return Metadata with extraction result
#3 opened by nmammeri - 1
- 0
Implement extracting from an array of bytes
#7 opened by nmammeri - 0
ocr examples and docs
#18 opened by nmammeri - 0
TypeError: ParseError("Parse error occurred : TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@281b1a01")
#16 opened by NourEldin-Osama - 9
failed to install in windows 11
#13 opened by NourEldin-Osama - 0
Add Microsoft Windows support
#1 opened by nmammeri - 0
make the build script faster
#9 opened by nmammeri - 2
PyPI package is huge
#10 opened by chrisgoddard - 4
- 0
Add tests with different file formats
#2 opened by nmammeri