Simple Data Leak Detection
Given a directory of sensitive files and a directory of unknown files, detect if the content of any sensitive files is in the unknown files.
Shingling, fingerprinting and collection intersection are used in a line to test the similarity between two files.