Show HN: I made a site to catalogue 10,000 CC0-licensed stock photos

liamca · on Jan 22, 2016

[Full disclosure, I work on a service called Azure Search]

Very nice site! Since your site is so much based around search, I thought I would pass on a few suggestions based on what I saw. If you happen to be using a search based engine for your content such as ElasticSearch, SOLR or maybe Azure Search :-), there are a few simple things you could add to make the experience a little smoother. Suggestions in the search box are nice to allow people to quickly see results as they type. You could even add thumbnails of the images in the type ahead such as you see using the Twitter Typeahead library (http://twitter.github.io/typeahead.js/). I also noticed that your search does not handle spelling mistakes or phonetic search (matching words that sound similar). Finally, through the use of Stemming, search engines can often help you find additional relevant content. For example, if the person is looking for mice, but your content has the word mouse in it, this will bring back a match. Since you don't have a lot of content, this can really help people find relevant content.

Hope that helps.

chdir · on Jan 22, 2016

> using the Twitter Typeahead library

Unfortunately, it's no longer maintained [0], plenty of unfixed issues. You could try a recent fork [1] :

[0] https://github.com/twitter/typeahead.js/issues/1424

[1] https://github.com/corejavascript/typeahead.js.

oliwarner · on Jan 22, 2016

I like to think I am very conscious of copyright. I might not always adhere to it in my person life (who can claim they do these days?!) but professionally, everything is done strictly legitimately. With that in mind... Am I the only person who is slightly uncomfortable with the phrasing around PD and CC0? With other copyright licenses there is somebody there is saying they own something.

I'm particularly uncomfortable with Flickr's "no known copyright restrictions". What if people infer PD from that and upload it somewhere else under CC0? Then it gets sucked into this finda.photo? Yuck.

As for finda.photo, why are you truncating the source down to just a domain name?! Many of the sources include proper uploader details so why aren't you copying those over and displaying them?

I know you're not required to, but attribution isn't a bad thing if you can give it. I for one would be much happier using a photo if I knew exactly where it came from.

ghaff · on Jan 22, 2016

One of the challenges I have with attribution generally--and, to be clear, I try to be very careful with attribution on any CC, etc. photos that I use--is that the attribution is usually detached from the photo. (It may be stored in the metadata--or not.) So, even though I make a point of cutting & pasting the flickr links when I'm putting together a presentation, it's very easy for the attribution text and the photo to become separated on subsequent use.

There are potential ways that you could fix this from a technology perspective, e.g. have a process to create a new JPEG with the credit below the original photo. But anything like this is going to be a bit clunky and potentially ugly graphically.

snowwrestler · on Jan 22, 2016

There are fields within the JPEG file itself for this information, called IPTC fields. I know they can be read with photo-specific software like Photo Mechanic or Photoshop, but they seem incredibly under-used by the Internet in general. They're perfect for a use case like persistent attribution, but few image software services seem to know about or expose them.

tombrossman · on Jan 22, 2016

I agree these are under-used but they face the same issues as the other methods like putting the data in text near the image on a web page or presentation slide. These fields, same as all Exif metadata, can be overwritten or removed by anyone with access to the file. The data can be faked by someone intending to deceive, or it could disappear when posted to big sites like Facebook or Twitter who routinely remove Exif data by default (presumably to protect the majority who don't understand how GPS tagging works on mobile phone photos, etc).

Once the photo metadata is gone, it is too easy for others to claim it is an 'orphan work' and avoid liability under copyright law. At the opposite end of the spectrum, people like me who release most images as CC0 are annoyed that that license tag was stripped from the metadata, preventing others from freely reusing them. I use and rely on Exif tags a lot but they are fragile and you cannot rely on them staying embedded with your images once they hit the web.

snowwrestler · on Jan 27, 2016

The data can be faked or deleted within IPTC fields, of course.

What IPTC fields have over the typical ways of handling attribution is that they are not left behind when the image file is copied--so they should be more resistant to accidental removal of attribution metadata.

On most websites, the attribution is a line of text that is displayed next to the image. Anyone copying the image, who wishes to preserve attribution, must also separately copy the attribution text. Then they need a way to store that text, and keep it associated with the image. Not easy, actually!

> Once the photo metadata is gone, it is too easy for others to claim it is an 'orphan work' and avoid liability under copyright law.

You cannot avoid liability this way. Under the law, it is the responsibility of the person using an image to know that they have the right to use it. Just claiming "I thought it was orphaned" does not work if you are being sued by the actual image rightsholder.

> I use and rely on Exif tags a lot but they are fragile and you cannot rely on them staying embedded with your images once they hit the web.

Yes, this is my point! They're fragile because web services don't preserve them--but theoretically they could.

The cynical side of me thinks that a lot of web services don't want to know all the rights data for the media they carry. Ignoring rights gets them more traffic and engagement, and under the relevant law (the DMCA), they are allowed to. All they have to do is remove infringing images when the rights holder requests it.

ghaff · on Jan 22, 2016

Right. In an ideal world you'd be using a bunch of CC content in a presentation, the appropriate IPTC fields would be filled in, and you could press a button and a block of credit text would be generated. (That's primarily about CC-BY I realize.) In practice, it's an incredibly manual process that I'm guessing most people don't follow and, even for those who try to, it probably breaks down more often than not.

Then there are all the issues with the NC and ND license variants and what they even mean exactly. But that's another rant.

EDIT: I'd just add that clearing rights and giving credits have been an issue for ever. On more than one occasion, I've gotten a semi-panicky email (and I think once actually a phonecall in pre-email days) securing permission to use one of my photos that was clearly on the verge of going into production. Presumably, someone came along and asked "You do have rights to this, correct?"

mattl · on Jan 22, 2016

We're (Creative Commons) writing something up about this right now.

davidbarker · on Jan 22, 2016

If you look underneath each photo, there's a link back to each original page, as well as a link to the photographer's URL of choice (depending on the source this might be their profile on Unsplash, or their own website).

oliwarner · on Jan 24, 2016

When I posted this comment the source link was just the top level page on (eg) Unsplash, not the profile of the user there.

davidbarker · on Jan 24, 2016

Do you remember which image you're referring to? I haven't changed either the image data or the site code since your comment, so I'm curious to know which one it is and if I can update it.

unicornporn · on Jan 22, 2016

These CC0 silos are quite risky to use in a professional context.

They often collect images en masse from a bunch of sources without further inspection. If someone uploads a copyrighted image to these sources and marks them CC0, they will end up in these CC0 aggregators. And, if your use of this image is discovered, you will be held liable for the damage caused by your action (well, at least here in Sweden).

I would do some research before using these images in a professional context. Look up the photographer and confirm that the image is a work of her/him. If this site included proper uploader details, it would make this work easier.

BrunoJo · on Jan 22, 2016

I always use https://pexels.com. They also have only CC0 images.

kjaer · on Jan 22, 2016

There's also http://librestock.com/, which you can use to search Pexels, but also a few other sites.

imrehg · on Jan 22, 2016

Strange, I was curious what kind of images are there, did a search for "Taiwan", and the result is literally 8 pictures with "Shutterstock" watermark and that's all. Is that supposed to be CC0? Even if it was yes, would that be useful at all to have watermarked images like that?

yitchelle · on Jan 22, 2016

Just tried the same thing. It looks like the Shutterstock watermarked photos were an advert sponsored by Shutterstock themselves. No actual CC0 photos for the site itself.

brandonheato · on Jan 22, 2016

Why not just use flickr? A search for images with "No known copyright restrictions" returned 663,502 results. https://www.flickr.com/search/?license=7%2C9%2C10&text=&adva...

ryanlol · on Jan 22, 2016

I'm not sure if "No known copyright restrictions" means what you think it does.

vortico · on Jan 22, 2016

I like the domain, the design is usable, and the database is great. This has it all.

m-i-l · on Jan 22, 2016

Looks good. Feedback from a designer I showed this to: it would be useful to search based on aspect ratio (landscape vs portrait at minimum).

davidbarker · on Jan 22, 2016

Thanks! That's actually already possible. There's a list of all the attributes you can search by here: http://finda.photo/search/tips

For example, http://finda.photo/search/?q=--aspectratio+%3C+1 would give you portrait images.

lucaspiller · on Jan 22, 2016

Very nice! What are you using to search the photos by colour and feature?

fratlas · on Jan 22, 2016

Probably performed dominant colour analysis on each (like color-thief), sort by similar ranked colours using a closeness transformation like LAB? No idea for the feature though.

elorant · on Jan 22, 2016

Is there an algorithm for what you just described? I'm currently researching for color classification and it seems quite a complicated issue.

kamy22 · on Jan 22, 2016

Hi, If you want you can contact me. I created a powerful algorithm to detect dominant colors in an image using K-means clustering and lab color space.

Examples: https://twitter.com/kamy22/status/479040852028051456 https://twitter.com/kamy22/status/472517258418606080

Have a good day!

criddell · on Jan 22, 2016

What about feature? If I want a dog, how do you determine which images contain a dog?

kamy22 · on Jan 22, 2016

You have to do a lot of research to solve these type of problems. I think that neural networks and machine learning are the best way... but it's a complex problem.

Here you can find awesome publications (http://rodrigob.github.io/are_we_there_yet/build/classificat...). It's something like a bible of neural networks :P

criddell · on Jan 22, 2016

Considering that companies like Google are so good at this, why build your own photo site? Why not upload all the CC0 images to a public Google Photos library?

kamy22 · on Jan 22, 2016

I like your question. Ok, you can use 500px, Flickr, Google and other sites... But... In my personal opinion, a developer should be curious. I'm a dev and, for this reason, I like to give myself a challenge. It's a good way to learn a lot, to discover new solutions, to meet new people, to improve my skills, to create something new. So... you can use Google Photos library or you can consider to create something different (because definitely your solution will be different from the others). It's a choice ;)

fratlas · on Jan 22, 2016

I'm currently trying to do this for my own side project, my current plan was analyse each photo using one of the many color analyser algos out there (color-thief seems to be popular/efficient), but the problem is, color is a very complicated thing to compare. I think euclidean distance for two LaB colors is the best for complexity to result ratio (and RBG to LaB is an easy/well-solved problem).

elorant · on Jan 22, 2016

I've come across this: http://developers.lyst.com/2014/02/22/color-detection/

Seems very good but I'm still researching. Hope it might help you.

pbhjpbhj · on Jan 22, 2016

FWIW digikam has a colour search - could check their code and see what they're doing.

0bit · on Jan 22, 2016

something like: http://freecode.com/projects/mactorii should work

petecooper · on Jan 22, 2016

Adding Alana to the list of CC0-only stock photos.

[1] http://alana.io/

[2] http://alana.io/about-us/

scope · on Jan 22, 2016

Adding to the list: Pixels

[3] https://www.pexels.com

[4] https://www.pexels.com/photo-license/

They currently have over 5000 photos (~600 new images are added every month)

Flimm · on Jan 22, 2016

When of the about pages says that the photos are on a GitHub repo, which sounds really cool, until you follow the link and the repo hasn't been shared yet. Hopefully it's just a matter of time before it is shared.

http://finda.photo/search/tips#contributing

j_lev · on Jan 22, 2016

Hi - for some reason the search bar keeps changing my search terms eg Australia --> Australium

davidbarker · on Jan 22, 2016

Sorry — that'll be the search trying to get a singular term from a plural.

I've tagged the images with singular terms to make them easier to search, so it will change terms like "bridges" to "bridge", or "men" to "man", unless I override each term. If anyone can suggest a better way, I'd be very grateful.

I'll add "Australia" as an override, for now. Thanks for pointing it out.

mintplant · on Jan 22, 2016

Have you looked into using stemming?

https://en.wikipedia.org/wiki/Stemming

Most major languages should have a library available to handle this for you.

awesan · on Jan 22, 2016

Instead of changing the search term directly, you can silently search for both.

Alternatively, you can use the Levenshtein distance to find words close to the search term (like its singular form).

bryanrasmussen · on Jan 22, 2016

stemming will be baked in to a search engine, for example elasticsearch or solr dependent on the language analyzer you use and how you map your fields.

frantzmiccoli · on Jan 22, 2016

It seems that you do have a valid SSL certificate but https://finda.photo/search?q=test is not working properly.

franciscop · on Jan 22, 2016

Check also http://pixabay.com/ for Public Domain pictures, I've found many awesome gems there

trtmrt · on Jan 22, 2016

Firstly it is slow... Secondly I have typed "wolf" and I got: 3 foxes, 1 lion, 2 monkeys, 1 snow house and 2 wolfs that do not look like wolfs !?

davidbarker · on Jan 22, 2016

Apologies for the slowness. I suspect it was struggling under the traffic. It seems to be running pretty quickly now, though.

Most of the images are automatically tagged, and can sometimes be incorrectly labelled. I'm aiming to work through and manually check them all. In the meantime, I might add the ability to flag incorrect keywords.

chrxr · on Jan 22, 2016

http://finda.photo/image/14847 - Tags are weird. This is not a dog, mouse, canine or feline. It's not sitting. It has 'eyes' but I think that might be irrelevant. Although I would agree that ferrets (not an included tag) are cute, I'm not sure I'd describe them as domestic. Otherwise, great!

chrxr · on Jan 22, 2016

I stand corrected: "The ferret (Mustela putorius furo) is the domesticated form of the European polecat" https://en.wikipedia.org/wiki/Ferret

pbhjpbhj · on Jan 22, 2016

Ferrets are relatively common domestic pets in the UK. Perhaps because the sport of ferreting was historically more prevalent here?

chrxr · on Jan 22, 2016

Yes, I had to check as after posting the initial message I remembered seeing a child walking a ferret on a lead in Penryn. I've never seen one wandering the streets of Oxford though...

They don't appear to make it into the top 10 pets of 2014: http://www.pfma.org.uk/pet-population-2014

Perhaps a ferret adoption drive is necessary! http://ferretshelters.org/

hantusk · on Jan 22, 2016

An idea: You could use this pretrained machine learning library to classify your images/improve search even more: https://www.reddit.com/r/MachineLearning/comments/3yt4o5/dee...

andreash · on Jan 22, 2016

what is the diff betweeen this and pixabay or pexels.com? which there was one meta-search engine to cover them all :)

rogeryu · on Jan 22, 2016

Even if there is no "diff", this is an extra backup. Any of those sites can go down I guess, then where are all those pictures?

davegri · on Jan 23, 2016

Say no more, http://librestock.com

uvesten · on Jan 22, 2016

I really like both the selection and the color chooser! Did you do any manual selection of the photos?

davidbarker · on Jan 22, 2016

Thanks!

The selection is mostly based on the source sites I chose, like Unsplash, which all have only good-quality photos. The aim was to show all of the images from those sites.

The actual download and analysis of the images is done on my local machine, and each image has its own JSON file. These are then used to populate/modify the database, so I can track any changes to each image's data (if I add/remove keywords, for example) using Git.

pigscantfly · on Jan 22, 2016

Have you thought about automating keyword discovery or do you plan to stick with a manually-auditable dataset?

http://arxiv.org/pdf/1412.2306.pdf

quaffapint · on Jan 22, 2016

Might be a good place for an infinite scroll when going through pages of image results - one lest click they have to do.

_spoonman · on Jan 22, 2016

Just a really great job on this. Love it.

shark1 · on Jan 22, 2016

Just curiosity, where the owner possibly find all these photos to fill up the database?

fareesh · on Jan 22, 2016

I ran into a Laravel error on the homepage due to the server running out of memory.

jlis · on Jan 22, 2016

nice one!

j3th9n · on Jan 22, 2016

My first search on "new york" returned nothing...

thecodemonkey · on Jan 22, 2016

If you're still a little concerned with licensing and copyrights, I would recommend taking a look at www.graphicstock.com - you just play a flat monthly or yearly fee and you can download as much as you want.

Disclaimer: I work for the company behind GraphicStock. Oh, and we're hiring!