/similars

storming fast, versatile program to search similar textfiles on devices

Primary LanguageJava

similars

storming fast, versatile program to search similar textfiles on devices

At least one argument necessary - directory/dir to comapre CWD agasint
In case of two and more argumetns, first is compared against all others (all file/dir,recursive)
Other understood args:
  --min=NUMB        minimal similarity in percent
  --minws=NUMB      minimal similarity in percent with whitechars removed
                    Note, that min/minws should be 0-100 inclusive. Bigger/higher will effectively exclude the method.
  --case=true/false if false, similarity is case insensitive
  --fitler=EXPRES   file path filter regex
  --exclude[=EXPRES]sources exclude list exclude matching source files form run. Eg ".*/(Main|Test|Test1|Test2|PURPOSE|TEST.ROOT|A)\.java$"
  --eraser[=EXPRES] remove matching (comment?) lines if enabled, default is \s*[/*\*#].*
  --names=false/true/icase/NUMBER/
                    will compare only if filenames are same/similar
                    false - not used, comapre all. true filenames must be same before comaprison
                    icase - like true, only9 case insensitive. NUMBER - names must be similar (percentages) 
  --verbose         verbose mode
  --html            will add html table to stdout
  --maxsize=NUMBER  maximum filesize in KB   
                    Default is 100 (100kb), which eats about 46GB ram and taks 5-8 minutes.
                    Biggrer files may cause OOM/crash
  --minsize=NUMBER  minumum filesize in B. Default 10  
  --maxratio=DOUBLE maximum filesize diff ratio
                    Default is 10. Unless target/n < source < target*N then comparison will be skipped.
                    Processed after comment removal
  --port[=NUMBER]   port to get progress and info
                    if enabled, it will reply pid@host time since lunch/eta; default port is 4812
                    The server will block program from exit.
everything not `-` starting  is considered as dir/file which  the CWD/first file/dir should be compared against