Task - Text summarization service with simple NLP and Python

Current implementation

Sumy library for simple text summarization
Flask for providing a REST endpoint
SQLLite, which is already part of Python

Features

Same documents are only generated once
Different algorithms from sumy could be used: "TextRankSummarizer", "LexRankSummarizer", "LuhnSummarizer" "LsaSummarizer"
Data is stored in database and will survive server restarts
REST endpoints following the REST pattern for get and save
Invalid data is handled with 400, 404 erros

The implementation is using Flask as a REST service endpoint. It provides the capability to summarize any text which could be later retrived with an unique document id.

Usage

start main.py to start the server
call REST endpoints with any REST client or framework

To submit a new text, use a POST command which returns also the generated document id of the provided text. POST call to http://127.0.0.1:5000/summary

{ "text" : "The Jedi and their troops set a trap for Grievous and confronted the cyborg general once he arrived. However, Grievous was able to eliminate several clone troopers, and after sustaining injuries, escape to the castle's secret command center. Sealing off the exits, the Confederate general trapped the Jedi and clones inside his castle. While Grievous was repaired by his caretaker droid, the general pitted the Jedi and the remaining clones against the castle's defenses—a molten incinerator pit and the general's pet roggwart, Gor. The clones were killed, but the Jedi were able to eliminate Gor." }

With the GET command, you will get the text summary, generated by very simple NLP. GET call to http://127.0.0.1:5000/summary/document_id

Result containing the summarized text:

{ "document_id": "a2e4b959b65d778c53aa6e0aa3a1b659", "summary": "Sealing off the exits, the Confederate general trapped the Jedi and clones inside his castle.While Grievous was repaired by his caretaker droid, the general pitted the Jedi and the remaining clones against the castle's defenses—a molten incinerator pit and the general's pet roggwart, Gor." }

Possible improvements

Make the summary in sentence amount linear the original text. Larger original text will then produce larger summary text
Using better NLP approaches like transformers (BERT) and transfer learning, which are able to produce better summarizations of text
Containerize solution and consider horizontal scaling possibilities
Create index on DOCUMENT_ID field in the database to improve access speed
Use a real and scalable database to store the data. Think of NoSQL databases to stpore the texts
Better hashing methods to generate document_id
Support different languages
More extended and detailed tests

juergen1976/text_summarization_server

Task - Text summarization service with simple NLP and Python

Current implementation

Features

Usage

Possible improvements