kontex

A machine learning summarizer inspired by the study "Automatic Text Summarization Using a Machine Learning Approach" in the 16th Brazilian Symposium on Artificial Intelligence Conference in 2002.

The purpose of the model is to determine which sentences to form extractive summaries of a document by ranking sentences, categorized by:

Mean-TF-ISF
Sentence Length
Sentence Position
Similarity to Title
Similarity to Keywords
Sentence-to-Sentence Cohesion
Sentence-to-Centroid Cohesion
Depth in the tree (related to Sentence-to-Centroid Cohesion)
Referring position in a given level of the tree (positions 1, 2, 3, and 4) (related to Sentence-to-Centroid Cohesion)
Indicator of main concepts
Occurrence of proper names
Occurrence of anaphors
Occurrence of non-essential information

We then run this model against the textteaser summarizer with the All the news dataset to train it to learn which sentences to form summaries from.

This project was made for "human learning of machine learning" - Jeffrey Fei

Todo list:

jeffreyfei/kontex