TextDistiller is an advanced, AI-driven tool designed to summarize books chapter by chapter or as a whole, providing concise yet comprehensive overviews. With TextDistiller, you can quickly grasp the core ideas and key takeaways from any book, saving time while maintaining comprehension.
- Chapter-wise Summarization: TextDistiller offers detailed summaries for each chapter, allowing you to focus on specific sections of interest.
- Entire Book Synopsis: For books without chapter divisions, TextDistiller condenses the entire text into a coherent summary.
- Powered by NLP: Utilizing state-of-the-art Natural Language Processing (NLP) techniques, TextDistiller effectively captures and summarizes the most pertinent content.
- User-Friendly Interface: TextDistiller features a clean and intuitive interface, making the summarization process accessible and straightforward for all users.
TextDistiller leverages the T5-small
pretrained model from HuggingFace Transformers to generate accurate and readable summaries. The process includes:
- Chunking: The book is divided into chunks, either by chapter or as a single unit.
- Tokenization: These chunks are tokenized using the
T5Tokenizer
for compatibility with theT5
model. - Summary Generation: The tokenized text is processed by the
T5ForConditionalGeneration
model to generate summary token IDs. - Decoding: The summary token IDs are decoded into human-readable text using the
T5Tokenizer
'sdecode()
function.
- Clone the repository:
git clone https://github.com/johngai19/TextDistiller.git
- Install the required dependencies:
pip install -r requirements.txt
- To run via CLI:
python3 bsCLI.py --path <path-to-PDF-file>
- To run on a Flask server with frontend and mail:
- Update the
sender_address
andsender_pass
inmail.py
. - Run
views.py
:python3 views.py
- Update the
We welcome contributions from the community! If you'd like to contribute to TextDistiller, please feel free to submit a pull request or open an issue. Your feedback and support are greatly appreciated!
TextDistiller is distributed under the MIT License.
Maintained by John Ngai