Yeah i agree. When the child is shown the labeled example of an elephant and infers the traits that make an elephant an elephant, her previous visual experiences provide background knowledge that restricts the space of hypotheses she considers. After all, there's an infinite set of logically consistent hypotheses.
Nevertheless, if you provide your machine system with the video of all the child's visual input, it still won't generalize well from single examples, the way children do effortlessly.
This reminds me of the saying: It took me ten years to become an overnight success.
Humans generalize well from few examples because, well, they've already processed billions of examples. A toddler may have never seen an elephant before, but it may have seen cars, trucks, birds, dogs, people, trees, skies, buildings etc, giving it concepts for bigness, smallness, aliveness, humanness and much else. With all these concepts in place, then yes it becomes easy to see what makes an elephant distinct from a dog or a person. And it would be too for an artificial neural network.
An interesting fact is that newborns have very few concepts to begin with. It takes some months for them for instance to learn to differentiate between alive and dead things (the family cat vs a teddy bear for instance).
Nevertheless, if you provide your machine system with the video of all the child's visual input, it still won't generalize well from single examples, the way children do effortlessly.