Fascinating concept. What if humans wrote the tests, and a genetic system like t...

_delirium · on July 25, 2011

That's been an idea since the 60s in various forms, but tractability tends to be a big problem. The classic approach is to randomly generate Lisp code trees, guided by various heuristics or randomized-optimization algorithms (e.g. genetic programming). There have been some successes, but typically smallish algorithms for well-defined problems, rather than "normal" full-sized programs.

Some other attempts include: inductive logic programming (automatically infer logic programs from desired example output), partial programming (use statistical machine learning to fill in incompletely specified behavior in a program), and renewed attempts to use exhaustive or heuristic search where the search space is constrained by strong modern type systems (e.g. MagicHaskeller).

rcfox · on July 25, 2011

I assume it would be like finding the interpolating polynomial of a bunch of points in a row. (Like this: . . . . . .) The resulting polynomial would go through all of the points, but have very large peaks/valleys in between.

Essentially, you'd get a system that passed all of your tests, but produced garbage for anything not covered by the tests.

eric-hu · on July 26, 2011

Your post got me excited because it sounds like you're describing Runge's Phenomenon ( http://en.wikipedia.org/wiki/Runge%27s_phenomenon ).

The wikipedia article doesn't have the best illustration, but I think the inherent idea is really wise: in trying too hard to meet your initial constraints, you can come up with a solution that's only useful at those constraint points.

rcfox · on July 26, 2011

I'd never heard of Runge's phenomenon, but it does indeed sound like I was describing it. Thanks for giving me a name for it.

eric-hu · on July 27, 2011

Interesting, I thought that was exactly what you were talking about :)

I learned about it in numerical analysis. It illustrates the downside of trying to be too precise--you can make a polynomial that will go through an arbitrary number of data points.

As the number of points increases, your function will look less like a line and more like a magnitude 9 earthquake on a seismograph. The function will pass through all the points used to define it. However, it'll be useless when predicting the original data's behavior, as it changes too quickly on small input.

Instead, mathematicians find more useful functions by relaxing the conditions so that the model function only has to come 'near' the data points.

peterhunt · on July 25, 2011

That's true, but I think this problem will eventually be tractable. We have tools like QuickCheck that can generate randomized test cases and ensure certain properties are maintained. While this probably isn't enough today, I could imagine this becoming a good starting point to improve this technology.