HazyResearch/pdftotree
:evergreen_tree: A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.
PythonMIT
Issues
- 0
sklearn not available anymore
#128 opened by TheophileCAE - 2
sklearn is deprecated
#123 opened by alvitawa - 0
- 1
Broken keras imports
#124 opened by b-hemanth - 1
- 0
Duplicate text and table in the extraction result
#121 opened by yetnikoff - 0
Dirsearch Error Help Guys ):
#120 opened by ALBIJALI - 0
- 3
PyPI v0.5.0 sdist is missing test data
#114 opened by jayvdb - 1
Non-free test data
#116 opened by jayvdb - 1
extract_tables missing function 'analyze_pages' from ./utils/pdf/pdf_utils.py
#115 opened by JBBalling - 1
Im newbie.
#113 opened by Mohanrajkarnan - 2
Extract captions of images
#112 opened by ashleo25 - 1
Images are not extracted properly
#110 opened by ashleo25 - 0
Pdftotree generates lot of tmp folder
#111 opened by ashleo25 - 7
- 0
Warnings from pdfbox are not suppressed
#101 opened by HiromuHota - 1
Verbose logging while training
#70 opened by HiromuHota - 1
Missing function at pdf_utils.py (analyze_pages)
#105 opened by busekuz - 1
- 0
Embed Base64-Encoded Images Inline In HTML?
#88 opened by HiromuHota - 5
Read PDF files from HDFS
#90 opened by ashleo25 - 1
Cell values are missing from a table
#96 opened by HiromuHota - 2
- 0
Inconsistent data models for bbox
#87 opened by HiromuHota - 4
How to I install python3 toolkit on Mac
#50 opened by oshjain - 0
- 3
- 2
Switch from Tabula to Camelot?
#78 opened by HiromuHota - 0
Update README about non-Python dependencies
#81 opened by HiromuHota - 1
- 17
- 0
- 1
how to use the pdftotree command line
#57 opened by littletree123 - 1
- 0
Add type annotations
#74 opened by HiromuHota - 3
pdftotree.parse() returns None
#54 opened by gtholpadiperitusai - 1
How to reproduce the vision model?
#69 opened by HiromuHota - 3
- 3
Any Pre-trained model available to download
#52 opened by Ruthvicp - 1
- 0
DOC: Broken link
#61 opened by MartinThoma - 4
ModuleNotFoundError: No module named 'pdftotree.ml'; 'pdftotree' is not a package
#51 opened by dzhang228 - 1
ImportError: MagickWand shared library not found
#56 opened by suvinks - 5
ValueError: min() arg is an empty sequence
#42 opened by ninja-otaku - 1
- 3
ModuleNotFoundError: No module named 'chardet'
#47 opened by ayoyu - 1
EPUB Conversion
#43 opened - 2
pdftotree.parse() returns None
#45 opened by jay-reynolds - 2
authorization issues with wand library
#44 opened by Zhenshan-Jin