We are going to break things down into following steps
- Read a pdf and parse text
- Separate our paragraphs
- Filter/recommend paragraphs that may contain value for us
- Read book from google drive
- Automatically post on group
- Take input on topics to filter on
- Run Parsr locally (the README is easy to follow).
- Convert PDF to JSON using Parsr's UI on localhost. Turn off all the modules except the following:
- header-footer-detection
- words-to-line-new
- reading-order-detection
- lines-to-paragraph
- hierarchy-detection
- Wait.
- Download the JSON from the Document Viewer tab.
- Supply it to the script and get the output file.