This repository contains a python application scraping text from US-CERT(Computer Emergency Readiness Team) website and performing text analysis for the Python for Developer course at Carnegie Mellon University in Dec 2015.
Scraping texts from one of US-CERT alerts
Cleaning up unstructured texts by removing web source codes using Beautiful Soup library
Replacing all noise, punctuation and whitespace, and perform appropriate substitutions for stemming,creating compound concepts, and normalization.
Producing an listing of tokens / concepts and output them to the monitor in ascending order.
Producing an listing of tokens / concepts along with their frequency and output them to the monitor in frequency ascending order.