I’ve been facing the issue of having entirely too many duplicate photos caused by backing up from my phone to multiple cloud apps and switching between them often (mostly One Drive and Dropbox). I kept on letting them build up and build up for fear of accidentally deleting unique photos.
Finally, I’ve had enough and found this ‘Czkawka’ app on GitHub via a Reddit thread. It looks very well built. At the bottom of the readme are suggestions for other apps that do the same in different ways.
Czkawka means hiccup in Polish and the developer explains a little bit about why he chose the name in the readme.
I’m currently waiting on the second backup of my pictures folder to finish copying to my external drive. I’ll run Czkawka against my pictures folder as soon as the second backup is safe and sound. I’ll report back.
Funnily enough, I couldn’t get it to run on Windows despite copying the ffmpeg library as instructed. It launches beautifully on Fedora when installed via Flatpak.
Features from the readme:
Written in memory-safe Rust
Amazingly fast - due to using more or less advanced algorithms and multithreading
Free, Open Source without ads
Multiplatform - works on Linux, Windows, macOS, FreeBSD and many more
Cache support - second and further scans should be much faster than the first one
CLI frontend - for easy automation
GUI frontend - uses GTK 4 or Slint frameworks
No spying - Czkawka does not have access to the Internet, nor does it collect any user information or statistics
Multilingual - support multiple languages like Polish, English or Italian
Multiple tools to use:
Duplicates - Finds duplicates based on file name, size or hash
Empty Folders - Finds empty folders with the help of an advanced algorithm
Big Files - Finds the provided number of the biggest files in given location
Empty Files - Looks for empty files across the drive
Temporary Files - Finds temporary files
Similar Images - Finds images which are not exactly the same (different resolution, watermarks)
Similar Videos - Looks for visually similar videos
Same Music - Searches for similar music by tags or by reading content and comparing it
Invalid Symbolic Links - Shows symbolic links which point to non-existent files/directories
Broken Files - Finds files that are invalid or corrupted
Bad Extensions - Lists files whose content not match with their extension
Does anyone have experience with this app? What is your usual solution to duplicate photos all over the place?
That does not sound foolproof.
Maybe hash is some kind of checksum… that might work
By all means try it… you want it to list duplicates, not delete them, in csse it makes mistakes.
Easy to use and works well for identifying duplicate files based on content.
Compares file contents by calculating checksums and listing duplicates.
Allows interactive deletion of duplicates (you can review before any action).
Can exclude directories like system folders by specifying paths (slight workaround needed, I guess)
Some examples:
fdupes -r /home/your_username :
This will search recursively (-r) in your home directory and list duplicates based on file content.
fdupes -r -d /home/your_username :
This will ask for your confirmation before deleting anything, so nothing is done automatically.
fdupes -r -S /home/your_username :
This will show a summary of the total space being wasted by duplicate files
Some more info:
tldr fdupes
fdupes
Finds duplicate files in a set of directories.
More information: https://github.com/adrianlopezroche/fdupes.
- Search a single directory:
fdupes path/to/directory
- Search multiple directories:
fdupes directory1 directory2
- Search a directory recursively:
fdupes -r path/to/directory
- Search multiple directories, one recursively:
fdupes directory1 -R directory2
- Search recursively, considering hardlinks as duplicates:
fdupes -rH path/to/directory
- Search recursively for duplicates and display interactive prompt to pick which ones to keep, deleting the others:
fdupes -rd path/to/directory
- Search recursively and delete duplicates without prompting:
fdupes -rdN path/to/directory
Hey guys, I appreciate the suggestions but I already settled on using Czkawka for getting rid of my duplicate pictures. I will keep fdupes in the back of my head. It’ll be useful for many things in the future.
The resulting numbers seem more sane to me as I don’t take a ton of pictures. I still have 3 copies of the original pictures folder on 3 separate devices. I’ll likely tarball it and upload it to the cloud for backup in perpetuity in case anything worthwhile was deleted.
Automatically Sorting Photos
The last part of this is I wanted to sort the pictures all into folders based on year and month. Over the years I’d sorted my photos different ways. Back in the early 2000s, I used event based folders (X Birthday party, Y wedding, etc). As I starting using smart phones as my primary camera and cloud backup most apps automatically sort into year > month folders. That’s a pretty good way to sort pictures IMHO.
It was off to Google to find a way to find an automated way to do this. I know EXIF data was embedded in a majority of the pictures and there had to be something out there to rename based on that data. Google turned up this excellent tutorial for using exiftooll: ExifTool example commands
After running exiftool to auto sort by year/month into new folders I was left with about 5,000 unsorted images. My assumption is these didn’t have EXIF data associated with them which makes sense. A lot of them were PNGs containing screenshots and memes which were ripe for deletion anyhow.
There were quite a few JPEGs unable to be sorted. They were all old pictures (15-20 yrs) the majority of them came from scans or on CDs from the photo developer (remember them? )
Anyways that’s my experience. I plan on rerunning the exiftool command again with minor changes to the output date format. Other than that I’m happy with how this turned out.
That is how I store most of my camera photos… in folders by year and month.
I also keep an index of folder contents… update index by hand when I add photos.