If you know what 'the Laplacian of the Gaussian' is, definitely go for OpenCV. If you'd like to see something based on high school math, try this wiki.
Here are some background and pointers. I'm not a computer vision guy, but these might get you started.
Computer vision is really hard. The computational resources we can spend on the problem are orders of magnitude smaller than what we're trying to emulate in the nature. I think (IANACVS!) something like 35% of the human brain's processing capacity deals with vision, for example. So the discipline is both highly mathematical (to root out shortcuts and shrink the problem) and computationally intense. Also, it's often got some kind of real-time requirements.
Don't let me discourage you though. Since I don't know much about the field, I don't know any of its successes. CV researchers have produced useful systems.
1. Image transform methods: looking up the 2D Fourier transform might be a good start.
2. Variational methods. These are hard to explain as they relate to vision, but if you read the first few chapters of Sussman & Wisdom's Structure & Interpretation of Classical Mechanics, you'll get an idea of what they're talking about.
3. Anything that involves sensors also involves noise. At my school, the first stage of the graduate CV track is a course based on [Papoulis & Pillai]'s Probability, Random Variables, and Stochastic Processes.
4. One of my classmates was involved in object modeling with manifolds. A manifold is basically an n-dimensional shape that follows mostly-Euclidean rules like a sphere, torus, or pretzel. I have no idea what to recommend here, because my curiosity about manifolds and repeated frustration in my efforts to find out more about them ultimately landed me in grad school. Watch out for that.
5. Obviously I know more about software than hardware, but you could definitely do worse than reading up on commonly-used sensors. A midlevel text on CCD cameras would be the way to go here.
6. You can read up on Visual Perception in humans. When I took a class on this as an undergrad, it was a senior-level psych elective. A related field is Psychophysics, which uses methods from physics to study sensory stimuli (e.g. acoustic waves) as they relate to perceived experience. See #s 1 and 3 above.
The reason there isn't a definitive text on vision is that vision doesn't work. Algorithms work, and so to have a definitive text, you just collect up all the notable ones that work, and write about them in a book. When you have a totally unsolved problem like vision (outside of certain highly constrained domains, like robot soccer, where the colors are guaranteed, or industrial tasks, where basically the whole scene is guaranteed), you basically have to choose which methods that currently don't work might have a chance of eventually leading to something that does work. There are many possible sets of choices for that, which is why you can't really have a definitive text.
What do you mean 'work'? There are many highly nontrivial successes in computer vision. Take a look at some of the examples here: http://yann.lecun.com/exdb/lenet/index.html
Mathematics was hardly cut and dry before Bertrand Russell wrote Principia Mathematica, but that book still have tremendous value (if only as a beacon!) There was plenty more to discover in geometry when Euclid wrote the elements.
A good computer vision book, even one that simply cataloged important successes and failures, would be quite valuable. It's not that it hasn't been written because it can't be, it's that it hasn't been written because no one has really gotten around to it. Good textbooks require a massive amount of effort to get right.
It doesn't work in the sense everybody most wants it to work, which is to slap a cheap CCD camera or three on top of a mobile robot and have it tell the robot everything about its surroundings, like human vision can. Yeah, if you point a camera at a very restricted specific set of objects, all lit the same way, and all in generally the same basic configuration, you can do some vaguely useful statistics to get some information out of it, but that's kind of a painfully limiting thing.
Put it this way: At Anybots, we have a robot with 16 cameras on its head. We pump all those cameras back to an operator station, stitch them all together in a vaguely fitting-together way, and then project them onto a big bank of monitors. Plop a human operator in front of the monitors, and he knows exactly the orientation of the robot, the configuration of its limbs, and has basically a 3d map of the entire scene and a model of the lighting, minus occlusions. I'm pretty sure nobody knows how to write a computer vision system to do that. I'm not even sure that anybody knows how to use the overlap in the cameras to figure out how to make a coherent nice projection of the cameras onto the monitors.
It's interesting -- I started writing this exact definition of "works" in my post above. Anthropomorphic bias, right?
Even with this understanding, though, I can't wrap my brain around what you're saying above. Computer vision doesn't "work" (for our definition of "work") because it's not feasible to get enough processing power together:
O( 3e10 ) neurons * 100 Hz/neuron = more cycles than you can rent on an NSF grant.
We've definitely got some exponential growing to do before we can even start meaningful experimentation. Does that mean no work should be done in the meantime?
I don't think we're doing that much processing to do vision. It's never the case that even close to all the cells in the visual cortex are being used. It's always a tiny fraction of that. Yeah, we have a bunch of cells that neuroscientists think are somehow related to doing vision, but there's no evidence that they're doing some kind of horribly computationally intensive thing all day long. If they were, we'd see it on MRI's as soon as we put a picture in front of somebody.
No, I think we just haven't figured out what to do, not that we're limited by computational resources. I think there are a bunch of pretty strong assumptions hardwired into the brain, so we can see physical objects, but then you can fool us by showing us images that mess with the assumptions (aka optical illusions).
(Not that this is a particularly useful idea, mind you. I'm not about to solve computer vision. It's just a random philosophical idea.)
It's never the case that even close to all the cells in the visual cortex are being used. It's always a tiny fraction of that.
This is also true of any individual transistor in a processor. An ALU contains hardware to perform many operations, but at most performs one per cycle. You can increase utilization through parallelism and pipelining, but you pay for it with more synchronization and control hardware that only switches on to resolve conflicts. And nowadays, half of all the transistors in a CPU are part of a cache (every cycle, you need a word or two, but get the whole block). You can't conclude that they're not making a contribution to output from the fact that they're often inactive.
No, I think we just haven't figured out what to do, not that we're limited by computational resources.
Based on what evidence? We know that the brain is hugely complex. We know it can sort through hugely complex problem spaces at least some of the time. Doesn't it seem like wishful thinking to assume both that the brain is really inefficient and that most of the problems it solves actually have simple hueristic solutions?
I'll step up and take ownership of the end result of this argument: If what you say is true and understanding is all we lack, why have these problems proved so stubborn?
Because understanding is almost always easier than data collection and tool-building.
Pure lack of understanding -- having all the data and having no idea what it means -- can keep a problem open for a generation or two. Einstein is famous because he solved a problem in EM physics that was at most 30 years old.
Lack of data or meaningful investigative techniques can keep a subject crippled for centuries. Physics went from rolling marbles to detecting radio waves in the two hundred fifty years since Galileo, while biology was pretty much stagnant. Without organic chemistry, even scientists were willing to believe that living organisms were made of some special God-stuff. Contrast that to the last 20 years. The human genome has been sequenced by biologists on their way to designing new organisms while physicists have been stuck throwing models around since the 1970s. Petri dishes are cheaper than supercolliders.
Now let's look at AI and machine vision. Our primary tools, brain imaging and computer simulation, are getting better every year. Maybe new ideas in algorithms or parallel architecture will bring "real" computer vision closer. In fact, they probably will. But even if they don't, this approach will definitely succeed eventually:
I don't have as many examples but here is one. Galileo discovered his principle by conducting very simple experiments with no special tools or a lot of data collection. How do you find the relevant data or build relevant tools if you don't have a clue about what you are supposed to discover?
What computer vision needs is mathematics, in my opinion. And math has nothing to do with "data collection and tool-building". It is entirely about understanding.
The link provides a good example of how people try to solve problems without any understanding. They don't know how the brain operates, yet they build its "model". And then they expect this model to solve the problem for them...
How can you say Galileo didn't use tools? He invented the telescope fer God's sake. He timed the descent of falling objects by hiring musicians to keep a beat (no watches). When he couldn't time things, he built ramps and arches to compare things side by side. Much of his machinery was didactic because he often needed to demonstrate his ideas to the courtiers who supported him. We credit him with the invention of the experimental method today because described them in language reminiscent of geometric proofs.
I agree with you about the CV thing, though. We should talk sometime...
Yeah, Galileo is a particularly bad example. I believe he had a professional instrument maker working for him full-time. Newton would illustrate the point better.
Galileo didn't invent the telescope, though. What he did do first is use telescopes for astronomy.
Saying that "vision doesn't work" is like saying "audio doesn't work". Outside a particular problem, you can't generalize to that level.
Automated surveillance from a fixed camera works extremely well (VSAM). Certain object class recognition like face or pedestrian detection works extremely well (Viola & Jones). Stereo depth extraction, some systems with amazing & cheap custom ASICs, works extremely well (Tyzx). 2D object instance recognition works extremely well (SIFT). This can power SLAM when combined with stereo imagery. Object Tracking works extremely well (Collins). Traversable road classification, car detection, road boundary & lane detection, and Stereo/SFM will power the first semi-autonomous cars. All that tech works really well today -- the task is system integration.
Also, vision isn't just visible light cameras. ASC is making a flash lidar that would make you drool. The "Swiss Ranger" is also good. Both can provide snap-shot volumetric 3D data combined with color and texture information, with no moving parts. They'll scale much better than scanning lidars and even multi-laser systems like Velodyne.
Add 3D processing like Spin Images to the mix, and integrated vision systems will get very, very powerful, very soon.
I like AnyBot's approach, but I think it is incorrect to assume "AI" won't be good enough and tele-operation will be king. If I were a pure-software robotics startup, I would focus on vision & navigation software, large/networked system integration/configuration/scalability utilities (like I'm sure you're building), and behavior systems.
For a text, I would recommend reading the winning papers from major vision/machine learning/robotics conferences, and the papers they reference. Grad students at CMU's Robotics Institute use Forsyth & Ponce's "computer vision, a modern approach".
http://www.amazon.com/Computer-Vision-Approach-David-Forsyth...
I would also recommend Mitchell's "Machine Learning", Strang's "Linear Algebra and it's applications", and throw in Thrun's new "Probabilistic Robotics" for fun.
This (http://www.amazon.com/Introductory-Techniques-3-D-Computer-V...) is the best book on Computer Vision I 've come across. I first encountered it when trawling Stanford courses for good problem sets and research projects. Sebastian Thrun used to teach a class with this book (http://robots.stanford.edu/cs223b04/). It is fairly expensive and fairly mathematical but it is worth every cent you pay for it (imho).
It's _far_ more powerful than Pixcavator, well supported, liberally licensed, and provides some really awesome high-level API's.