
Compare sentences from input document with all sentences from reference documents - find very similar ones.

Primary LanguagePythonMIT LicenseMIT

Plagiarism Checker


This is a command-line tool for checking the similarity between a given text and a set of reference documents. The tool uses the Jaccard similarity algorithm to compare the input text with the reference documents.


Install in an isolated environment using pipx (or normal pip):

pipx install sentence-plagiarism

CLI Usage

To run the plagiarism checker, use the following command:

sentence-plagiarism <path-to-input-file> <path-to-reference-file-1> <path-to-reference-file-2> ... [--threshold <threshold-value>] [--output_file <path-to-output-file>] [--quiet]
  • <path-to-input-file>: Path to the input file to be checked for plagiarism.
  • <path-to-reference-file-1> ...: Paths to the reference files to compare against.
  • --threshold: (optional) The minimum similarity score required to consider a sentence as plagiarized. The value should be between 0 and 1.
  • --output-file (optional): Path to the output file to save the results in JSON format.
  • --quiet (optional): Flag to suppress the display of similar sentences in the console.


The following command:

sentence-plagiarism  input.txt --reference-files ref1.txt ref2.txt --similarity-threshold 0.8 --output-file results.json

can produce the following output on stdout:

Input Sentence:     The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.
Reference Sentence:  foobar  The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.
Reference Document: ref1.txt
Similarity Score: 0.9667

Input Sentence:      Closing thoughts  For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.
Reference Sentence:  barfoo  For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.
Reference Document: ref2.txt
Similarity Score: 0.8966

Results saved to results.json

and save results to results.json.

Programmatic Usage

from sentence_plagiarism import check

    reference_files=["txt/txt2.txt", "txt/txt3.txt"],


Distributed under the MIT License. See LICENSE for more information.


Krystian Safjan - ksafjan@gmail.com

Project Link: https://github.com/izikeros/sentence-plagiarism