/web-data-similarity-analysis

Analyze web data based on TF-IDF, Cosine Similarity

Primary LanguageHTMLMIT LicenseMIT

web-data-similarity-analysis

Analyze web data based on TF-IDF, Cosine Similarity

1


2


3

Compatibility

This web application could run on the Ubuntu 20.04 LTS (WSL2) and is compatible with only Python 3.

Elasticsearch is used to store crawled data. Therefore, you need to run Elasticsearch before you run the web application.

Installation

Python requirements are as below:

validators
requests
flask
numpy
nltk
bs4

NLTK requirements are as below:

$ python -m nltk.downloader punkt
$ python -m nltk.downloader stopwords

Usage

$ python ./app.py