The project utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) algorithm. The main objective of this project is to measure the similarity between text documents using the TF-IDF algorithm.
The main objective of this project is to measure the similarity between text documents using the TF-IDF algorithm. This allows for the calculation of a similarity score between text documents and enables comparisons.
The project utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) algorithm. This algorithm calculates the ratio between the frequency of each term in a document and the number of documents in the collection that contain that term. This provides a similarity score between the documents.
Python 3
sklearn
To measure TF-IDF similarity, follow the steps below:
- Run the main.py file.
- Add the file names of the text documents to be compared to the text_files list.
- Run the program to display the similarity results on the screen.
- Ensure that you have the necessary dependencies installed before running the program. You can install the dependencies by running the following command:
Below are examples of the project's outputs:
Similarity between test1.txt and test2.txt is -> 0.432891