A collection of scripts that do common tasks when working with large datasets.
Samples every Nth line of a file
Pastes specified text before and after a list of lines. Adds commit statements every 100 lines.
Removes every line specified in a file from another file.
Combine all files with a given extension to a single file. Option to keep only the header of the first file. (useful for csv files)
Replaces all commas with a dot '.' Replaces all semicolons with a comma ','
Takes a file with a list of words Removes all instances of words which occur more than once