Submitted to TripAdvisor Software Engineering Challenge on Sept 7, 2019
- Percentage of similar tuples takes into account if the files are different lengths. For example, if no more tuples of length N can be constructed from file two, then file one tuples still get constructed, and
TOTAL_TUPLE_COUNT
is still incremented, even though there is no checking for plagiarism done. - Files will be in the same directory as the script
- Optimizing for speed rather than memory - explains the global variables, choice of dictionary (rather than consistent list iteration), etc.
You can run the script as: python3 main.py -s syns.txt --fileone fileA.txt --filetwo fileB.txt
, and add an extra -t
flag to indicate a different tuple length.