Czkawka - Duplicate Photo / File Finder App (Cross platform)

DanTheManDRH · 22 September 2024 06:59

I’ve been facing the issue of having entirely too many duplicate photos caused by backing up from my phone to multiple cloud apps and switching between them often (mostly One Drive and Dropbox). I kept on letting them build up and build up for fear of accidentally deleting unique photos.

Finally, I’ve had enough and found this ‘Czkawka’ app on GitHub via a Reddit thread. It looks very well built. At the bottom of the readme are suggestions for other apps that do the same in different ways.

Czkawka means hiccup in Polish and the developer explains a little bit about why he chose the name in the readme.

I’m currently waiting on the second backup of my pictures folder to finish copying to my external drive. I’ll run Czkawka against my pictures folder as soon as the second backup is safe and sound. I’ll report back.

Funnily enough, I couldn’t get it to run on Windows despite copying the ffmpeg library as instructed. It launches beautifully on Fedora when installed via Flatpak.

Features from the readme:

Written in memory-safe Rust
Amazingly fast - due to using more or less advanced algorithms and multithreading
Free, Open Source without ads
Multiplatform - works on Linux, Windows, macOS, FreeBSD and many more
Cache support - second and further scans should be much faster than the first one
CLI frontend - for easy automation
GUI frontend - uses GTK 4 or Slint frameworks
No spying - Czkawka does not have access to the Internet, nor does it collect any user information or statistics
Multilingual - support multiple languages like Polish, English or Italian
Multiple tools to use:
    Duplicates - Finds duplicates based on file name, size or hash
    Empty Folders - Finds empty folders with the help of an advanced algorithm
    Big Files - Finds the provided number of the biggest files in given location
    Empty Files - Looks for empty files across the drive
    Temporary Files - Finds temporary files
    Similar Images - Finds images which are not exactly the same (different resolution, watermarks)
    Similar Videos - Looks for visually similar videos
    Same Music - Searches for similar music by tags or by reading content and comparing it
    Invalid Symbolic Links - Shows symbolic links which point to non-existent files/directories
    Broken Files - Finds files that are invalid or corrupted
    Bad Extensions - Lists files whose content not match with their extension

Does anyone have experience with this app? What is your usual solution to duplicate photos all over the place?

Rosika · 22 September 2024 12:53

Hi @DanTheManDRH ,

funny that you should mention czkawka because I was once considering this app too. Haven´t heard about it in a long time.

I even downloaded it at the time as an appimage (linux_czkawka_gui_alternative.AppImage).

But that was such a long time ago that I cannot for the life of me remember whether or not I was satisfied using it. Sorry.

In the end I was looking for some alternatives and finally settled for fdupes.
It´s a CLI app but if possible I opt for terminal programmes.

I´m satisfied with it so far but I have to admit it´s time I put it to good use again.

Many greetings from Rosika

nevj · 22 September 2024 12:55

That does not sound foolproof.
Maybe hash is some kind of checksum… that might work
By all means try it… you want it to list duplicates, not delete them, in csse it makes mistakes.

There are other ways

fdupes looks to be what you need.

Rosika · 22 September 2024 13:04

@DanTheManDRH and @nevj :

Yes, indeed.

fdupes (CLI)

Advantages:

Easy to use and works well for identifying duplicate files based on content.
Compares file contents by calculating checksums and listing duplicates.
Allows interactive deletion of duplicates (you can review before any action).
Can exclude directories like system folders by specifying paths (slight workaround needed, I guess)

Some examples:

fdupes -r /home/your_username :

This will search recursively (-r) in your home directory and list duplicates based on file content.

fdupes -r -d /home/your_username :

This will ask for your confirmation before deleting anything, so nothing is done automatically.

fdupes -r -S /home/your_username :

This will show a summary of the total space being wasted by duplicate files

Some more info:

tldr fdupes

  fdupes

  Finds duplicate files in a set of directories.
  More information: https://github.com/adrianlopezroche/fdupes.

  - Search a single directory:
    fdupes path/to/directory

  - Search multiple directories:
    fdupes directory1 directory2

  - Search a directory recursively:
    fdupes -r path/to/directory

  - Search multiple directories, one recursively:
    fdupes directory1 -R directory2

  - Search recursively, considering hardlinks as duplicates:
    fdupes -rH path/to/directory

  - Search recursively for duplicates and display interactive prompt to pick which ones to keep, deleting the others:
    fdupes -rd path/to/directory

  - Search recursively and delete duplicates without prompting:
    fdupes -rdN path/to/directory

Hope it helps.

Cheers from Rosika

DanTheManDRH · 27 September 2024 03:57

Hey guys, I appreciate the suggestions but I already settled on using Czkawka for getting rid of my duplicate pictures. I will keep fdupes in the back of my head. It’ll be useful for many things in the future.

The results:

I started with 165.3 GB and 55,238 files in my original pictures folder. Czkawka found 35,130 duplicate photos.

After deleting duplicate photos and videos I was left with 50.6 GB of data and 18,910 unique pictures and videos.

The resulting numbers seem more sane to me as I don’t take a ton of pictures. I still have 3 copies of the original pictures folder on 3 separate devices. I’ll likely tarball it and upload it to the cloud for backup in perpetuity in case anything worthwhile was deleted.

Automatically Sorting Photos

The last part of this is I wanted to sort the pictures all into folders based on year and month. Over the years I’d sorted my photos different ways. Back in the early 2000s, I used event based folders (X Birthday party, Y wedding, etc). As I starting using smart phones as my primary camera and cloud backup most apps automatically sort into year > month folders. That’s a pretty good way to sort pictures IMHO.

It was off to Google to find a way to find an automated way to do this. I know EXIF data was embedded in a majority of the pictures and there had to be something out there to rename based on that data. Google turned up this excellent tutorial for using exiftooll: ExifTool example commands

After running exiftool to auto sort by year/month into new folders I was left with about 5,000 unsorted images. My assumption is these didn’t have EXIF data associated with them which makes sense. A lot of them were PNGs containing screenshots and memes which were ripe for deletion anyhow.

There were quite a few JPEGs unable to be sorted. They were all old pictures (15-20 yrs) the majority of them came from scans or on CDs from the photo developer (remember them? )

Anyways that’s my experience. I plan on rerunning the exiftool command again with minor changes to the output date format. Other than that I’m happy with how this turned out.

nevj · 27 September 2024 06:44

That is how I store most of my camera photos… in folders by year and month.
I also keep an index of folder contents… update index by hand when I add photos.

eisenfeld · 24 February 2025 14:48

I like Czkawka ! I used it to manage all downloaded pictures from my Google and Amazon account (thousands of pictures).
You have so many smart options and it is really fast ( written in Rust ). Another good app for filename manipulation is “Szyszka” also written in Rust with many nice features as well.

nevj · 24 February 2025 18:20

I had a look here

The example seems to be bulk renaming files into another language. That could be useful… I have never considered that problem.
It needs GTK