- There are 6 documents in the data directory
- Objective is to find the plagirism scores for each documents (1 - 5) with respect to the Query document
- First, we generare the TFIDF values with respect to all 6 documents
- Then we measure the cosine similarity between the each document and the Query document seperately
- Cosine similarity will give how similar a given document is to the Query document
- This cosine similarity measure is then represented as a plagiarism score