[Full disclosure, I work on a service called Azure Search]
Very nice site! Since your site is so much based around search, I thought I would pass on a few suggestions based on what I saw. If you happen to be using a search based engine for your content such as ElasticSearch, SOLR or maybe Azure Search :-), there are a few simple things you could add to make the experience a little smoother. Suggestions in the search box are nice to allow people to quickly see results as they type. You could even add thumbnails of the images in the type ahead such as you see using the Twitter Typeahead library (http://twitter.github.io/typeahead.js/). I also noticed that your search does not handle spelling mistakes or phonetic search (matching words that sound similar). Finally, through the use of Stemming, search engines can often help you find additional relevant content. For example, if the person is looking for mice, but your content has the word mouse in it, this will bring back a match. Since you don't have a lot of content, this can really help people find relevant content.
I like to think I am very conscious of copyright. I might not always adhere to it in my person life (who can claim they do these days?!) but professionally, everything is done strictly legitimately. With that in mind... Am I the only person who is slightly uncomfortable with the phrasing around PD and CC0? With other copyright licenses there is somebody there is saying they own something.
I'm particularly uncomfortable with Flickr's "no known copyright restrictions". What if people infer PD from that and upload it somewhere else under CC0? Then it gets sucked into this finda.photo? Yuck.
As for finda.photo, why are you truncating the source down to just a domain name?! Many of the sources include proper uploader details so why aren't you copying those over and displaying them?
I know you're not required to, but attribution isn't a bad thing if you can give it. I for one would be much happier using a photo if I knew exactly where it came from.
One of the challenges I have with attribution generally--and, to be clear, I try to be very careful with attribution on any CC, etc. photos that I use--is that the attribution is usually detached from the photo. (It may be stored in the metadata--or not.) So, even though I make a point of cutting & pasting the flickr links when I'm putting together a presentation, it's very easy for the attribution text and the photo to become separated on subsequent use.
There are potential ways that you could fix this from a technology perspective, e.g. have a process to create a new JPEG with the credit below the original photo. But anything like this is going to be a bit clunky and potentially ugly graphically.
There are fields within the JPEG file itself for this information, called IPTC fields. I know they can be read with photo-specific software like Photo Mechanic or Photoshop, but they seem incredibly under-used by the Internet in general. They're perfect for a use case like persistent attribution, but few image software services seem to know about or expose them.
I agree these are under-used but they face the same issues as the other methods like putting the data in text near the image on a web page or presentation slide. These fields, same as all Exif metadata, can be overwritten or removed by anyone with access to the file. The data can be faked by someone intending to deceive, or it could disappear when posted to big sites like Facebook or Twitter who routinely remove Exif data by default (presumably to protect the majority who don't understand how GPS tagging works on mobile phone photos, etc).
Once the photo metadata is gone, it is too easy for others to claim it is an 'orphan work' and avoid liability under copyright law. At the opposite end of the spectrum, people like me who release most images as CC0 are annoyed that that license tag was stripped from the metadata, preventing others from freely reusing them. I use and rely on Exif tags a lot but they are fragile and you cannot rely on them staying embedded with your images once they hit the web.
The data can be faked or deleted within IPTC fields, of course.
What IPTC fields have over the typical ways of handling attribution is that they are not left behind when the image file is copied--so they should be more resistant to accidental removal of attribution metadata.
On most websites, the attribution is a line of text that is displayed next to the image. Anyone copying the image, who wishes to preserve attribution, must also separately copy the attribution text. Then they need a way to store that text, and keep it associated with the image. Not easy, actually!
> Once the photo metadata is gone, it is too easy for others to claim it is an 'orphan work' and avoid liability under copyright law.
You cannot avoid liability this way. Under the law, it is the responsibility of the person using an image to know that they have the right to use it. Just claiming "I thought it was orphaned" does not work if you are being sued by the actual image rightsholder.
> I use and rely on Exif tags a lot but they are fragile and you cannot rely on them staying embedded with your images once they hit the web.
Yes, this is my point! They're fragile because web services don't preserve them--but theoretically they could.
The cynical side of me thinks that a lot of web services don't want to know all the rights data for the media they carry. Ignoring rights gets them more traffic and engagement, and under the relevant law (the DMCA), they are allowed to. All they have to do is remove infringing images when the rights holder requests it.
Right. In an ideal world you'd be using a bunch of CC content in a presentation, the appropriate IPTC fields would be filled in, and you could press a button and a block of credit text would be generated. (That's primarily about CC-BY I realize.) In practice, it's an incredibly manual process that I'm guessing most people don't follow and, even for those who try to, it probably breaks down more often than not.
Then there are all the issues with the NC and ND license variants and what they even mean exactly. But that's another rant.
EDIT: I'd just add that clearing rights and giving credits have been an issue for ever. On more than one occasion, I've gotten a semi-panicky email (and I think once actually a phonecall in pre-email days) securing permission to use one of my photos that was clearly on the verge of going into production. Presumably, someone came along and asked "You do have rights to this, correct?"
If you look underneath each photo, there's a link back to each original page, as well as a link to the photographer's URL of choice (depending on the source this might be their profile on Unsplash, or their own website).
Do you remember which image you're referring to? I haven't changed either the image data or the site code since your comment, so I'm curious to know which one it is and if I can update it.
These CC0 silos are quite risky to use in a professional context.
They often collect images en masse from a bunch of sources without further inspection. If someone uploads a copyrighted image to these sources and marks them CC0, they will end up in these CC0 aggregators. And, if your use of this image is discovered, you will be held liable for the damage caused by your action (well, at least here in Sweden).
I would do some research before using these images in a professional context. Look up the photographer and confirm that the image is a work of her/him. If this site included proper uploader details, it would make this work easier.
Strange, I was curious what kind of images are there, did a search for "Taiwan", and the result is literally 8 pictures with "Shutterstock" watermark and that's all. Is that supposed to be CC0? Even if it was yes, would that be useful at all to have watermarked images like that?
Just tried the same thing. It looks like the Shutterstock watermarked photos were an advert sponsored by Shutterstock themselves. No actual CC0 photos for the site itself.
Probably performed dominant colour analysis on each (like color-thief), sort by similar ranked colours using a closeness transformation like LAB? No idea for the feature though.
You have to do a lot of research to solve these type of problems. I think that neural networks and machine learning are the best way... but it's a complex problem.
Considering that companies like Google are so good at this, why build your own photo site? Why not upload all the CC0 images to a public Google Photos library?
I like your question. Ok, you can use 500px, Flickr, Google and other sites... But... In my personal opinion, a developer should be curious. I'm a dev and, for this reason, I like to give myself a challenge. It's a good way to learn a lot, to discover new solutions, to meet new people, to improve my skills, to create something new. So... you can use Google Photos library or you can consider to create something different (because definitely your solution will be different from the others). It's a choice ;)
I'm currently trying to do this for my own side project, my current plan was analyse each photo using one of the many color analyser algos out there (color-thief seems to be popular/efficient), but the problem is, color is a very complicated thing to compare. I think euclidean distance for two LaB colors is the best for complexity to result ratio (and RBG to LaB is an easy/well-solved problem).
When of the about pages says that the photos are on a GitHub repo, which sounds really cool, until you follow the link and the repo hasn't been shared yet. Hopefully it's just a matter of time before it is shared.
Sorry — that'll be the search trying to get a singular term from a plural.
I've tagged the images with singular terms to make them easier to search, so it will change terms like "bridges" to "bridge", or "men" to "man", unless I override each term. If anyone can suggest a better way, I'd be very grateful.
I'll add "Australia" as an override, for now. Thanks for pointing it out.
stemming will be baked in to a search engine, for example elasticsearch or solr dependent on the language analyzer you use and how you map your fields.
Apologies for the slowness. I suspect it was struggling under the traffic. It seems to be running pretty quickly now, though.
Most of the images are automatically tagged, and can sometimes be incorrectly labelled. I'm aiming to work through and manually check them all. In the meantime, I might add the ability to flag incorrect keywords.
http://finda.photo/image/14847 - Tags are weird. This is not a dog, mouse, canine or feline. It's not sitting. It has 'eyes' but I think that might be irrelevant. Although I would agree that ferrets (not an included tag) are cute, I'm not sure I'd describe them as domestic. Otherwise, great!
Yes, I had to check as after posting the initial message I remembered seeing a child walking a ferret on a lead in Penryn. I've never seen one wandering the streets of Oxford though...
The selection is mostly based on the source sites I chose, like Unsplash, which all have only good-quality photos. The aim was to show all of the images from those sites.
The actual download and analysis of the images is done on my local machine, and each image has its own JSON file. These are then used to populate/modify the database, so I can track any changes to each image's data (if I add/remove keywords, for example) using Git.
If you're still a little concerned with licensing and copyrights, I would recommend taking a look at www.graphicstock.com - you just play a flat monthly or yearly fee and you can download as much as you want.
Disclaimer: I work for the company behind GraphicStock. Oh, and we're hiring!
Very nice site! Since your site is so much based around search, I thought I would pass on a few suggestions based on what I saw. If you happen to be using a search based engine for your content such as ElasticSearch, SOLR or maybe Azure Search :-), there are a few simple things you could add to make the experience a little smoother. Suggestions in the search box are nice to allow people to quickly see results as they type. You could even add thumbnails of the images in the type ahead such as you see using the Twitter Typeahead library (http://twitter.github.io/typeahead.js/). I also noticed that your search does not handle spelling mistakes or phonetic search (matching words that sound similar). Finally, through the use of Stemming, search engines can often help you find additional relevant content. For example, if the person is looking for mice, but your content has the word mouse in it, this will bring back a match. Since you don't have a lot of content, this can really help people find relevant content.
Hope that helps.