A machine learning summarizer inspired by the study "Automatic Text Summarization Using a Machine Learning Approach" in the 16th Brazilian Symposium on Artificial Intelligence Conference in 2002.
The purpose of the model is to determine which sentences to form extractive summaries of a document by ranking sentences, categorized by:
- Mean-TF-ISF
- Sentence Length
- Sentence Position
- Similarity to Title
- Similarity to Keywords
- Sentence-to-Sentence Cohesion
- Sentence-to-Centroid Cohesion
- Depth in the tree (related to Sentence-to-Centroid Cohesion)
- Referring position in a given level of the tree (positions 1, 2, 3, and 4) (related to Sentence-to-Centroid Cohesion)
- Indicator of main concepts
- Occurrence of proper names
- Occurrence of anaphors
- Occurrence of non-essential information
We then run this model against the textteaser summarizer with the All the news dataset to train it to learn which sentences to form summaries from.
This project was made for "human learning of machine learning" - Jeffrey Fei
Todo list:
- Host the model on a server to let users test it by inputting a title and body