This repository is meant for NTU URECA(Undergraduates Research Experience on CAmpus)
Based on Google's textsum model to genereate headlines for domain-specific(cybersecurity) news.
Usage of each file
textsum: sequential model for summarization. Binary/Text convert;
crawlers: Crawlers to download news from various websites. Can form a corpus with 30,000 cybersecurity related news.
data: both binary and text. b_corpus_data is the latest version in binary form. Contains all the news with title less than 30 words and the first two sentence less than 120 words.
corpus2vocab: programme to convert news in corpus to binary vocabulary file.