Copy Catcher
is a NuGet package designed to identify and list duplicate files within a specified directory. It uses advanced techniques and optimizations to ensure efficient and accurate detection of files with identical content.
-
Buffered Reading:
Copy Catcher
uses buffered reading to efficiently read large files in chunks, reducing memory usage and enhancing performance. -
Asynchronous Operations: The package is designed to leverage asynchronous operations, ensuring non-blocking I/O operations. This results in a smoother user experience, especially when dealing with large directories or files.
-
Early Byte Exiting: Before hashing the entire file,
Copy Catcher
checks the initial bytes of files. If two files have different initial bytes, they are immediately identified as distinct, saving computational resources. -
Chunk Hashing: Instead of hashing the entire file in one go,
Copy Catcher
hashes files in chunks. This approach is more memory-efficient and allows for faster identification of large duplicate files. -
Parallelism: The package employs parallel processing to scan and hash multiple files concurrently. This takes full advantage of multi-core processors, drastically reducing the time required to identify duplicates in large directories.
- .NET SDK installed on your machine.
- A .NET project where you want to use
Copy Catcher
.
Install the Copy Catcher
NuGet package using the NuGet Package Manager:
Install-Package CopyCatcher
Or using the .NET CLI:
dotnet add package CopyCatcher
In your .NET project, add the following using directive:
using CopyCatcher.Shared;
Create an instance of the DuplicateFinderService
:
var service = new DuplicateFinderService("path/to/directory");
Call the FindDuplicates
method:
var duplicates = service.FindDuplicates();
The FindDuplicates
method will return a dictionary where keys are hash values and values are lists of file paths that have the same hash:
{
"abc123def456": ["path/to/duplicate1.txt", "path/to/duplicate2.txt"],
...
}
A simple .NET Console app using Copy Catcher would look like this:
using CopyCatcher;
Console.WriteLine("Enter the directory path:");
var directoryPath = Console.ReadLine();
// Initialize the service and find duplicates
var duplicateFinderService = new DuplicateFinderService(directoryPath);
var duplicates = duplicateFinderService.FindDuplicates();
// Display results
foreach (var duplicate in duplicates)
{
Console.WriteLine($"Hash: {duplicate.Key}");
foreach (var filePath in duplicate.Value)
{
Console.WriteLine($" - {filePath}");
}
}
- FileReader: Reads files from the file system.
- FileHasher: Computes a hash value for each file to determine duplicates.
- DirectoryScanner: Scans the specified directory and retrieves a list of all files. It uses the
DirectoryProvider
to access the file system, ensuring better testability and separation of concerns. - DirectoryProvider: Provides direct access to the file system, used by
DirectoryScanner
. - DuplicateFinderService: The main service that ties all components together and provides an easy-to-use interface for finding duplicates.
- The user specifies a directory to be scanned.
DirectoryScanner
retrieves a list of all files in the directory.FileHasher
computes a hash for each file.- Duplicate files are identified based on their hash values and returned in a dictionary.