/md5-filesystem-tools

Manage and analyze file integrity in nested directories with our MD5 toolkit. Features recursive checksum creation, duplicate detection through checksums, and automated removal of duplicates.

Primary LanguageShell

MD5 filesystem tools

Harness the power of MD5 checksums for your filesystem with our toolkit of MD5 filesystem tools designed for managing and analyzing file integrity across nested directories. Features include recursive checksum file creation, duplicate file detection based on checksum comparison, and automated duplicate removal using checksum validation.

generate_checksums_recursively.sh

This script is designed to recursively find all directories within a specified root directory, calculate the number of files in each directory (excluding subdirectories and any 'checksums.md5' files), and generate a 'checksums.md5' file containing the MD5 checksums for each file within the directory. It also provides feedback on its progress and the results of its operations.

Usage

This command will process the 'Photos' directory, showing the progress and results for each subdirectory found:

./generate_checksums_recursively.sh ~/Photos

find_duplicates.sh

This script is designed to identify and list duplicate files based on their MD5 checksums. It requires a file named checksums.md5 (generated by generate_checksums_recursively.sh or simply by md5sum) that contains a list of MD5 checksums and their corresponding files. The script will process this checksum file and group the files by their checksums to identify duplicates.

Usage

To check for duplicates using a file checksums.md5 in the 'Photos' directory:

md5sum ~/Photos/* > checksums.md5
./find_duplicates.sh ~/Photos

Example output:

26ab0db90d72e28ad0ba1e22ee510510:
/home/user/Photos/Toronto2024.jpg
/home/user/Photos/Toronto2024 (1).jpg

b026324c6904b2a9cb4b88d6d61c81d1:
/home/user/Photos/Paris2023.jpg
/home/user/Photos/Paris2023 (1).jpg

remove_duplicates.sh

This script is designed to identify and remove duplicate files within a specified directory, keeping only the file with the shortest name in each set of duplicates. It utilizes a checksums.md5 file (generated by generate_checksums_recursively.sh or simply by md5sum), which should contain the MD5 hashes of the files in the directory, to identify duplicates.

Usage

To remove duplicates using a file checksums.md5 in the 'Photos' directory:

md5sum ~/Photos/* > checksums.md5
./remove_duplicates.sh ~/Photos

Example output:

/home/user/Photos/Toronto2024.jpg
/home/user/Photos/Toronto2024 (1).jpg
Removing: /home/user/Photos/Toronto2024 (1).jpg

/home/user/Photos/Paris2023.jpg
/home/user/Photos/Paris2023 (1).jpg
Removing: /home/user/Photos/Paris2023 (1).jpg

find_duplicates_recursively.sh

The script scans the specified directory and its subdirectories for checksums.md5 files (generated by generate_checksums_recursively.sh), which contain pre-computed hashes of the files. It then groups files with identical hashes, listing each group of duplicates to help users manually review and decide how to handle these duplicates.

Usage

To find duplicates in the 'Photos' directory:

./generate_checksums_recursively.sh ~/Photos
./find_duplicates_recursively.sh ~/Photos

Example output:

Processing /home/pavel/Photos/Berlin/checksums.md5
Processing /home/pavel/Photos/Boston/checksums.md5
Processing /home/pavel/Photos/All/checksums.md5

c30f7472766d25af1dc80b3ffc9a58c7
/home/pavel/Photos/Berlin/1.jpg
/home/pavel/Photos/Berlin/11.jpg
/home/pavel/Photos/All/Berlin.jpg

26ab0db90d72e28ad0ba1e22ee510510
/home/pavel/Photos/Boston/2.jpg
/home/pavel/Photos/Boston/22.jpg
/home/pavel/Photos/All/Boston.jpg

remove_duplicates_recursively.sh

The script designed to automate the process of identifying and removing duplicate files within a specified directory and its subdirectories, leveraging existing checksums.md5 files (generated by generate_checksums_recursively.sh). The script identifies duplicates based on their hashes, retains the file with the shortest path in each set of duplicates, and removes the others.

Usage

To remove duplicates in the 'Photos' directory:

./generate_checksums_recursively.sh ~/Photos
./find_duplicates_recursively.sh ~/Photos

Example output:

Processing /home/pavel/Photos/Berlin/checksums.md5
Processing /home/pavel/Photos/Boston/checksums.md5
Processing /home/pavel/Photos/All/checksums.md5
Removing: /home/pavel/Photos/Berlin/11.jpg
Removing: /home/pavel/Photos/All/Berlin.jpg
Removing: /home/pavel/Photos/Boston/22.jpg
Removing: /home/pavel/Photos/All/Boston.jpg