/txtcmp

A tool for finding similar text files

Primary LanguageShellMIT LicenseMIT

txtcmp

Build Status

txtcmp is tool for finding similar text files. It is meant for the case where you have many files and a few may be similar. It works by computing the longest common subsequence (LCS) of all provided files, which is what is at work in utilities like diff. It is not as useful in the case where you have just two files. You should use diff for that.

Examples

Compare three files

txtcmp file1.txt file2.txt file3.txt

Find the most similar files in a directory

find path/to/dir -type f | xargs txtcmp | sort -n | tail