/czkawka

Multi functional app to find duplicates, empty folders, similar images etc.

Primary LanguageRustOtherNOASSERTION

com github qarmin czkawka

Czkawka (tch•kav•ka, hiccup) is a simple, fast and free app to remove unnecessary files from your computer.

Features

  • Written in memory safe Rust
  • Amazingly fast - due to using more or less advanced algorithms and multithreading
  • Free, Open Source without ads
  • Multiplatform - works on Linux, Windows and macOS
  • Cache support - second and further scans should be a lot faster than the first one
  • CLI frontend - for easy automation
  • GUI frontend - uses modern GTK 3 and looks similar to FSlint
  • Rich search option - allows setting absolute included and excluded directories, set of allowed file extensions or excluded items with the * wildcard
  • Multiple tools to use:
    • Duplicates - Finds duplicates based on file name, size, hash, hash of just first 1 MB of a file
    • Empty Folders - Finds empty folders with the help of an advanced algorithm
    • Big Files - Finds the provided number of the biggest files in given location
    • Empty Files - Looks for empty files across the drive
    • Temporary Files - Finds temporary files
    • Similar Images - Finds images which are not exactly the same (different resolution, watermarks)
    • Zeroed Files - Finds files which are filled with zeros (usually corrupted)
    • Same Music - Searches for music with the same artist, album etc.
    • Invalid Symbolic Links - Shows symbolic links which point to non-existent files/directories
    • Broken Files - Finds files with an invalid extension or that are corrupted

Czkawka

How do I use it?

You can find the instructions on how to use Czkawka here.

Installation

Installation instructions with download links you can find here.

Compilation

If you want to try and develop Czkawka or just use the latest available feature, you may want to look at the compilation instruction.

Benchmarks

Since Czkawka is written in Rust and it aims to be a faster alternative to FSlint or DupeGuru which are written in Python, we need to compare the speed of these tools.

I tested it on a 256 GB SSD and a i7-4770 CPU.

I prepared a disk and performed a test without any folder exceptions and with disabled ignoring of hard links which contained 363 215 files, took 221,8 GB and had 62093 duplicate files in 31790 groups which occupied 4,1 GB.

I set the minimal file size to check to 1KB on all programs.

App Executing Time
FSlint 2.4.7 (First Run) 86s
FSlint 2.4.7 (Second Run) 43s
Czkawka 3.0.0 (First Run) 8s
Czkawka 3.0.0 (Second Run) 7s
DupeGuru 4.1.1 (First Run) 22s
DupeGuru 4.1.1 (Second Run) 21s

I used Mprof for checking memory usage of FSlint and DupeGuru, and Heaptrack for Czkawka.

App Idle Ram Max Operational Ram Usage Stabilized after search
FSlint 2.4.7 62 MB 164 MB 158 MB
Dupeguru 4.1.1 90 MB 170 MB 166 MB
Czkawka 3.0.0 12 MB 122 MB 60 MB

In Dupeguru I enabled checking images with different dimensions to match Czkawka behavior. Both apps use caching mechanism, so second scan is really fast.

Similar images which check 10949 files that occupied 6.6 GB

App Scan time
Czkawka 3.0.0 (First Run) 276s
Czkawka 3.0.0 (Second Run) 1s
DupeGuru 4.1.1 (First Run) 539s
DupeGuru 4.1.1 (Second Run) 1s

Similar images which check 349 image files that occupied 1.7 GB

App Scan time
Czkawka 3.0.0 (First Run) 54s
Czkawka 3.0.0 (Second Run) 1s
DupeGuru 4.1.1 (First Run) 55s
DupeGuru 4.1.1 (Second Run) 1s

Comparison to other tools

Bleachbit is a master at finding and removing temporary files, while Czkawka only finds the most basic ones. So these two apps shouldn't be compared directly or be considered as an alternative to one another.

Czkawka FSlint DupeGuru Bleachbit
Language Rust Python Python/Obj-C Python
OS Lin,Mac,Win Lin Lin,Mac,Win Lin,Mac,Win
Framework GTK 3 PyGTK2 Qt 5 (PyQt)/Cocoa PyGTK3
Duplicate finder
Empty files
Empty folders
Temporary files
Big files
Similar images
Zeroed Files
Music duplicates(tags)
Invalid symlinks
Broken Files
Names conflict
Installed packages
Invalid names
Bad ID
Non stripped binaries
Redundant whitespace
Overwriting files
Multiple languages(po)
Cache support
In active development Yes No Yes Yes

Contributions

Contributions to this repository are welcome.

You can help by creating:

  • Bug reports - memory leaks, unexpected behavior, crashes
  • Feature proposals - proposal to change/add/delete some features
  • Pull Requests - implementing a new feature yourself or fixing bugs. If the change is bigger, then it's a good idea to open a new issue to discuss changes.
  • Documentation - There is an instruction which you can improve.

You can also help by doing different things:

Name

Czkawka is a Polish word which means hiccup.

I chose this name because I wanted to hear people speaking other languages pronounce it, so feel free to spell it the way you want.

This name is not as bad as it seems, because I was also thinking about using words like żółć, gżegżółka or żołądź, but I gave up on these ideas because they contained Polish characters, which would cause difficulty in searching for the project.

At the beginning of the program creation, if the response concerning the name was unanimously negative, I prepared myself for a possible change of the name of the program, and the opinions were extremely mixed.

License

Code is distributed under MIT license.

Icon was created by jannuary and licensed CC-BY-4.0.

Windows dark theme is used from AdMin repo with MIT license.

The program is completely free to use.

"Gratis to uczciwa cena" - "Free is a fair price"

Donations

If you are using the app, I would appreciate a donation for its further development, which can be done here.