/Text-Analysis-with-Python

Capstone Project of Python for Developer course at CMU

Primary LanguagePython

Text Processing and Libraries with Python

This repository contains a python application scraping text from US-CERT(Computer Emergency Readiness Team) website and performing text analysis for the Python for Developer course at Carnegie Mellon University in Dec 2015.

Scraping texts from one of US-CERT alerts

Cleaning up unstructured texts by removing web source codes using Beautiful Soup library

Replacing all noise, punctuation and whitespace, and perform appropriate substitutions for stemming,creating compound concepts, and normalization.

Producing an listing of tokens / concepts and output them to the monitor in ascending order.

Producing an listing of tokens / concepts along with their frequency and output them to the monitor in frequency ascending order.