/Webpage-similarity-assessment

Analysis of similarity of webpages by word frequency

Primary LanguageJupyter Notebook

Webpage similarity assessment

This program analyises the text in all the webpages of the Geoinsyssoft website and finds the similarities between the pages by a frequency analysis of common words. A similarity matrix is generated at the end. Uses Beautiful soup to do the webpage finding and Scikit-learn to do the frequency analysis