Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have one primary collection but I also have probably 30TB of old drives lying around with photos scattered all about. Is there something I can do to mount those drives, grab all image files and then dedupe/catalog them (at least by date)? Ideally this would work with videos as well.


Considering you're using Linux, I'd do something of sorts:

  1. get all image files with "find", considering they are with a known extension
  2. run jdupes on the dump and deduplicate them.
  3. run exiftool on them to automatically divide them to folders based on any metadata field you like.
  4. Index all of the images with digikam and further organize them there.
Another path would be to add all drives as "removable collections" to digikam and manage all of them there. digikam also has fuzzy search so it can find not only identical but similar images so you can deduplicate them.

Both ways are applicable to videos as well.

I'm currently using the second path since Digikam is already my primary photo cataloging and managing tool for years and, it works wonders.

---

On mac, Gemini II and Retrobatch would allow for a similar workflow but, I didn't use them as my primary workflow tools. Gemini also has similarity search so it can deduplicate similar photos.

I'm not using Windows for more than a decade so, I don't know anything on that front.


This looks like a great approach, thanks for putting it together. I'm hoping to dump these disks to an image b/c the filesystems on them are all over the place...some of them are IDE lol, then just loopback mount them for this process.


I'm glad that it helped. Organizing old files always take much longer than anticipated. It's a live exercise on entropy :)

Taking images and mounting them sounds reasonable. :)


From the comfort of a terminal[1], a few options comes to mind,

git-annex[2] will allow you to index all, or just some, of those files where they are - and keep track if you shuffle them around. The really useful feature in your case, is that git-annex will keep tabs on even your disconnected harddrives, flashdrives or cloud storage. It will let you know if you have redundant copies and how many, or if you're about to trash the last known instance of IMG001.jpg. It will point you to specific storage media if query some file not currently local.

Note that it's not entirely as trivial as I make it out to be - git vcs experience helps. Some love it.

In your situation, I'd might try borg[3] - No experience, but I heard appreciative voices about it and docs seem OK.

Personally, I always end up using rmlint/fdupe and unix tools, but that's a secret.

[2] https://git-annex.branchable.com/ [2] https://github.com/borgbackup/borg [3] There's GUI implementations of these


Whoa git-annex looks like it would be useful for a few projects, thanks for the tip. Will take a look at both.


I've been pleased with PhotoSweeper on my Mac. I started with the free Lite version, then went ahead and bought the full version.

I set it at the highest match setting. If you adjust it to a lower setting, it will match things where somebody is looking at the camera versus looking away.

https://apps.apple.com/us/app/photosweeper/id463362050?mt=12 https://apps.apple.com/us/app/photosweeper-lite/id506150103?...

PS: I was going to say that they have a PC version but it looks like the 3rd entry for me on DuckDuckGo is actually spam that says there is a PC version, then feeds you to programs that are "like PhotoSweeper". I wouldn't download those.


Will check this out, thanks for the suggestion!


I bought a Synology NAS mainly for images/videos/personal files and run hard drives in RAID setup. You could explore other NAS solutions like QNAP or open source.


I've stopped and started shopping for a NAS at least five times now. One of these days i'll pull the trigger.


Synology's photo station is slightly jarring... the whole indexing operation isn't optimized at all so if you have like 500k photos it's going to be spinning for a while.


There's an ancient-but-maintained image cataloger / asset manager called NeoFinder, available on Windows and Mac. Would love to find an OSS equivalent.

https://cdfinder.de/


Will take a look, thank you!


Same problem. The other issue is having a 2 cameras and 2 phones in the family and grouping together events. Its easier just to save them all and not try.


My problem is I'm paranoid of getting rid of anything now. I just want to take one good sweep through so I can do something with those drives.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: