/textsum

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

textsum

This repository is meant for NTU URECA(Undergraduates Research Experience on CAmpus)

Based on Google's textsum model to genereate headlines for domain-specific(cybersecurity) news.

Usage of each file

textsum: sequential model for summarization. Binary/Text convert;

crawlers: Crawlers to download news from various websites. Can form a corpus with 30,000 cybersecurity related news.

data: both binary and text. b_corpus_data is the latest version in binary form. Contains all the news with title less than 30 words and the first two sentence less than 120 words.

corpus2vocab: programme to convert news in corpus to binary vocabulary file.

convert: convert news to text file.