This is a fascinating concept: choosing the optimum training set for a "learner" when the true model is known. While the paper focuses on using the system to teach humans, I see the value in other cases where the training set would be expensive to acquire. Examples would include long-running simulations, such as protein folding, or where each example has a significant materials cost, like chemical testing. Clearly, generating 60,000 observations (MNIST) would quickly become cost prohibitive compared to carefully selecting training examples to optimize learning.
In these situations "active learning" is used. You generate multiple models, ideally with Bayesian inference, and then do a search for an example that causes the most disagreement among them.
Hmm for those cases though, you would have to parametrize the training set to be able to optimally select the next training example in which case you're back to active learning.
> The key difference is that
the teacher knows theta-star upfront and doesn’t need to explore.
Based on this quote from page 1 and Wikipedia for "active learning" [1], I see the distinction in my examples as this:
1. In machine teaching, the teacher chooses the new observations to be presented. In active learning, the learner chooses new observations to be labeled.
2. In machine teaching, the availability of observations in general is controlled. In active learning, observations may be available but unlabeled, as in a cross-validation set. E.g., voting on which observation to have labeled.
This is a nice little paper that provides a great introduction to machine teaching. I think the Socratic dialogue format was an excellent choice as it makes it very easy to follow.
The big problem with machine teaching in many practical applications is what the paper refers to as the "glaring flaw", and that is that you often don't know what the learning algorithm might look like (e.g. in the provided nefarious example of trying to defeat a spam filter). In fact, the learning algorithm could be arbitrarily complex.
In the case where you do know the learning algorithm exactly (e.g. the learner is a robot where you have its precise specifications), the problem is the deterministic optimization problem described in this paper. But when the learning algorithm is unknown, the problem becomes stochastic, and then you're facing all of the traditional problems with optimization in a probabilistic space (e.g. overfitting, robustness problems, etc). That's not to say that it's strictly impossible to apply machine teaching approaches in such a case, it's just that it's a much more difficult problem to find a somewhat optimal training set.
I really enjoyed learning about the techniques used in machine learning to select an ideal data set when there is no target model (compared to the paper, where the model is known).
A common practice is called k-fold cross-validation, where you train with k-1 subsets of D, then do validation with the kth set. You rotate through k times, so that every subset serves as the validation set once. The results can then be combined to produce a model which is usually much more effective than just setting aside part of D for training and part for validation.
Like machine teaching, where the goal is to minimize D, this is based on the assumption that some data points are much better for training the model than others.
We devoted about a year to developing machine teaching techniques for a project. Our goal was to optimize not just the learning rate but also retention. When the subject matter is a good fit for machine teaching it can work very well.
One of the problems with the human effectively limiting the rate at which the fitness function can be evaluated is that the "teaching solution" converges very slowly. This is where a-priory rules can make it more dynamic and make the difference between usable and unusable.
"I draw the reader’s attention to machine teaching, the problem of finding an optimal training set given a machine learning algorithm and a target model. In addition to generating fascinating mathematical questions for computer scientists to ponder, machine teaching holds the promise of enhancing education and personnel training."