Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Deep Learning’s Impact on Image Processing, Mathematics, and Humanity (siam.org)
213 points by alfaiate on May 7, 2017 | hide | past | favorite | 149 comments


I question this meme that deep learning has less mathematical elegance or interpretability than other machine learning model. It's simply not true. At this point, it reads like a mindless copypasta with the possible political intention to reduce AI funding and use, such as the EU regulation which requires all algorithms to "explain" their reasoning. Either that, or the author is just more comfortable with familiar old-school methods which makes them think that those methods are more elegant. This push towards elegance is unscientific and counterproductive, much like the mistaken push Einstein and co. made for deterministic interpretations of quantum mechanics. We can now understand that its motivation was simply due to a biased notion of elegance due to familiarity with classical mechanics.

Examples of elegance in deep learning: The gating mechanisms of LSTMs (also adds to interpretability); structure of convolutions, i.e. weight sharing, translation invariance; the policy gradient theorem (Sutton 1999); dropout, its relationship to biology and variational inference; Generative Adversarial Networks, and connections to game theory, split-brain; the reparametrization trick; the log-derivative trick; connectionist temporal classification. These examples are only the surface. Depending on your specialization inside deep learning, you'd find many more.

Examples of interpretability: Nguyen et al.'s Synthesizing preferred inputs to hidden neurons, Zeiler et al. convnet visualization, Guillaume Alain's linear probes of hidden layers, attention readouts in attentional models for machine translation or speech synthesis, and many more. Ultimately, deep learning methods are probabilistic, and decent deep learning engineers would be able to tell you why a model is doing what it's doing by printing probabilities and activation statistics, much like any other probabilistic machine learning model.


>> This push towards elegance is unscientific

I think you are right on that one. The Standard Model for example is absoluteley not elegant, it is a huge convoluted mess of parameters and constants (just google for "Standard Model Lagrangian"). Yet, it is the best we came up with, explaining the dynamics of fields and particles[0], that surround us. Correct answers don't have to be elegant per se, its nice when they are, but it's not a prerequisite.

But, consider this: You can train a Neural Network to predict the distance of an object, thrown with velocity v_0, an angle of α, under the influcence of gravitational acceleration g. After a few hunderd rounds of training, a suited NN can reasonably predict the outcome of said experiment, with the input: v_0, α and g. And, as you pointed out, a researcher can explain to you why this is the case based on activation, feedback-loops, learning algorithm and other parameters. But neither the NN nor the researcher will be able to give you a rule like: "F=(G m_1 m_2)/(r^2)" to explain the underlying reasons for the object-trajectory dynamics.

A Neural Network can give you answers and predictions, yes, but you are not able to incorporate them into a wider theory, since the output is always numerical in nature. It is also always tied to one specific fact you are interested in and cannot give you a generalization.

[0]: In the model of QFT particles are also fields of a different kind.


But it will be much more accurate. The neural network will learn to take into account air resistance and wind, for instance. Depending on how much data you give it and the resolution, it might even learn to model air turbulence, spinning and wobbling of thrown objects, etc. It will be a lot more accurate then the simplistic physics model.

And sure, it doesn't output a nice simple equation. But in most real problems, there aren't nice simple equations to output. Physics is sort of a fluke, where the fundamental laws of nature happen to be so simple we can do that. In something like biology, you are never going to find a simple equation that explains a biological system. They are complicated messes of millions of interacting pieces. Even if we knew how everything worked, we don't have the computing resources to accurately simulate it anyway.

And we are actually seeing neural networks be useful in these areas. They are able to predict what chemicals are more likely to be useful drugs. If you restricted yourself to using only simple mathematical models, you would never get these results.

This very article is about the very difficult domain of image processing. Why should we expect the statistical distribution of photographs to be simple and explainable by simple math? Finding an equation that accurately models the shape of a human face is pointless and misguided. Before NNs came along, they were doing crazy stuff like handcoding complicated heuristics that detected dark spots that might be eye sockets. There's nothing elegant about that. Physicists are really blessed to have a domain where they can ignore stuff like air resistance. Where the data really does fit simple equations.

And as the parent comment says, you can interpret neural networks. You can observe the activation behaviors, and discover what features it's learned to detect eye sockets, or chemicals with certain properties. You can see how the output changes as the inputs change, and fit a simpler model to that.


But that's precisely the problem isn't it? It's hard to know when it's picking up additional details or when it's overfitting to noise.


Why is that hard? Expose it to data it hasn't seen yet and see if the results check out.


Expose it to which data exactly? Say it's picking up on some obscure phenomena that is relatively rare, yet very important. How would you, completely unaware of this, be able to collect more relevant data when the majority of the data you collect would not have this element?


But that's a problem with the data, not with the functioning of the net. The whole thing hinges on the premise that you will be able to tell 'sense' from 'nonsense' when you see it, even if you don't know exactly why it labeled 'sense' as 'sense' and vv.

If you are simply amassing data without purpose then what you could do is to try to visualize the layers of the network to see if any surprising features turn up but that will only work if the features are obvious enough to stand out and if that were the case I would suspect we would not have this discussion in the first place.

Computers excel at: remembering stuff forever and speed.

So any kind of improvement that a neural net or any other solution would bring to the table over a human would likely fall in either one of those categories, either the computer is faster at solving the problem, or its ability to remember and apply a vast amount of data to the problem will give it a (slight) edge over what a human could do, or maybe it will be just enough to reach parity.

It will not tell you what is and what is not significant in the input, though, with enough samples you with your slower but much superior intellect might be able to draw new and far-reaching conclusions once you are exposed to the data in sufficient quantity yourself that a pattern suggests itself.

That's in a way a very nice collaboration between man and machine, each doing what they are best at.

A key element in something like this being at play would be the computer consistently being at odds with experts in the field but being right more often than not about those cases. That would be an excellent opportunity to wake up to the possibility that there is something that should be obvious and noticeable but that still got missed.


> Computers excel at: remembering stuff forever and speed.

> So any kind of improvement that a neural net or any other solution would bring to the table over a human would likely fall in either one of those categories, either the computer is faster at solving the problem, or its ability to remember and apply a vast amount of data to the problem will give it a (slight) edge over what a human could do, or maybe it will be just enough to reach parity.

> It will not tell you what is and what is not significant in the input, though, with enough samples you with your slower but much superior intellect might be able to draw new and far-reaching conclusions once you are exposed to the data in sufficient quantity yourself that a pattern suggests itself.

I think that its too early to say how AI will be deployed in practice -- whether it will augment or replace human roles. The ambition is certainly to produce enough "intelligence" to supersede a large fraction of human decision making.

As we seek to make computers more powerful, robust, and versatile ("AI"), it seems like we are pushed towards more organic computational structures. If the trend continues, it would imply that the closer we get to AI, the less their strengths and weaknesses would resemble computers of yesteryear. The interesting possibility is that one might be able to have an interpolation between the strengths of humans and computers.


That may have something to do with the fact that even the biggest neural networks have a "cortex size" one or two orders of magnitude smaller than a cat and "exist" (number of impulses processed * amount of time that would take in an animal) for a few weeks.

That means one should expect them to perform tasks about as well as a mouse brain can do them, between a third and halfway to adulthood.

And yes, that level of cortical processing does not seem to support coming up with symbolic systems and attempting to use them to describe the world around them. Absolutely.

Mice don't do that either.


Am I out of date? I thought that current big deep networks had 100's of millions of parameters, where as mammalian brains were of the order of 10's of billions of neurons with perhaps 1000 times more connections so giving 10's of trillions of parameters (if you imagine a mammalian neuron to work like that).


One parameter maps to one synapse, not to one neuron.


A Neural network will give you a relation which is approximation of that relation not just a numeric output . Also you should remember that while it looks elegant the Newtonian model is also an approximation and fails noticeably in cases where relativistic effects are significant


>But neither the NN nor the researcher will be able to give you a rule like: "F=(G m_1 m_2)/(r^2)" to explain the underlying reasons for the object-trajectory dynamics.

It is not impossible. There is a combination of a special neural network architecture and strong sparsity-inducing regularization that makes it possible to learn equations from dynamics dataset: https://openreview.net/pdf?id=BkgRp0FYe


This is a very interesting paper. Is there an implementation example in any major framework?


It seems to me that you could treat a sufficiently accurate model (be it a deep net or otherwise) as the solution to an unknown, governing physical law, and attempt to identify the law from the model, post-hoc.

I'm not sure if there's been work done in the domain of pattern recognition, but here's an overview where in an example, Navier-Stokes is identified from data:

https://sinews.siam.org/Details-Page/data-driven-discovery-o...


That's the same as my own example of explaining to people that dogs brains can do some pretty complex math. I throw a stick that's turning as it flies and the dog catches it every time, but it can't explain nor communicate the formula that went into that.

But this is really an optimization problem: representing your formula as a bunch of weights that are then used to drive a cascade of multiply-add rules to derive some output is an imprecise but roughly accurate way to model a problem. That there is a much more direct and analytical way of modeling that problem requires intelligence of a different order.

But a program like mathematica gets awfully close to that, so that's not a problem that is a good fit for being solved with a neural network even if it would work.

I'd prefer to use a neural network for problems that have a less well defined solution.


A beautiful example and well-illustrated by the fact that humans could play catch before the development of trigonometry


> But neither the NN nor the researcher will be able to give you a rule like: "F=(G m_1 m_2)/(r^2)" to explain the underlying reasons for the object-trajectory dynamics.

Come on, you can just fit the NN numerical outputs with Excel and come up with tentative explanations for both the underlying elegant rule and the science. The point with NN-based science being you are not left guessing much, instead retrofit the NN numerical output and build a coherent narrative with other accepted results.


Without that equation, you cannot build the rocket that goes to the moon, however.


Feeding the retrofitted "blurred" equation to mathematicians and let them come up with tentative exact models, while being almost there already?


Anything that fits within the error bars is viable if you ask me. Whatever works works.


A lot of investors agree with you. Often they get very rich, especially if they head for the beach before the fit breaks down.

On the other hand...


> cannot give you a generalization

Maybe a second NN could find that for you. You might end up with factors like "speed of an unladen swallow", but perhaps all that is needed is some nudge away from that local maxima to keep searching the space.


>Correct answers don't have to be elegant per se, its nice when they are, but it's not a prerequisite

If a powerful idea is not elegant, we have the option of creating new mathematics in which it _can_ be elegantly expressed.


requires all algorithms to "explain" their reasoning

It's about the application. If you are detecting cat images on a social network, then a few false positives or negatives are totally no big deal. If your algo is making investment decisions or is being used as evidence in court then it is entirely reasonable to expect it to have a rock solid causal chain easily comprehensible by a layperson. The example in the article is noise reduction and the method was "we tweaked it until we got the results we wanted". In many fields that's called "overfitting" (or "fitting up") and is a big red flag.


All machine learning models are capable of overfitting. Deep learning does not have a special relationship to overfitting, and in fact, there is some evidence that the structural priors in complex deep networks actually inhibit overfitting and improve generalization [1].

Citing certain examples of misuse of deep learning by folks who "tweaked it till it worked" doesn't say anything about deep learning at large. The same team would do the same with a much less powerful model.

Of course, being able to explain is a very valuable trait. But there's just no evidence that deep learning methods are any less amenable to interpretation than SVMs or decision trees. The latter models' "interpretation" is mostly stuff like "feature 543 and 632 were on" , while deep learning methods can not only do that, but can also synthesize examples of characteristics that the model looked for [2].

[1]: https://arxiv.org/abs/1611.03530 [2]: https://arxiv.org/abs/1605.09304


Deep learning does not have a special relationship to overfitting

Yes and no. Let's say you have a time series and you know there is some pattern in it and you are interested in predicting what it will be at time t. A Fourier analysis would give you a formula to plug numbers into that anyone can understand. NN or DL might give you as good, or accounting for noise, even a slightly better prediction when backtested but now you need to explain "why did it do this" - to a financial regulator who is looking at insider trades, or a commission looking into why a piece of equipment failed killing the user (say, a self-driving car), or a newspaper who's noticed more people of race X are being turned down for mortgages. That's why I say it is dependent on the application of the algo.


>> [SVM and Decision Trees'] "interpretation" is mostly stuff like "feature 543 and 632 were on"

SVMs are well-known black-box optimisers of numerical parameters so not more or less interpretable than neural nets, but decision tree learners are a different matter. The model they build is symbolic- a propositional logic theorem that can be output in a format that is directly inspectable and interpretable by a human being with an understanding of propositional logic.

It's true that once decision trees grow beyond a certain size they're hard to interpret, but even so they can make a lot more sense than a heap of numbers in a large graph.

Neural nets used for image processing are special in that they can output their activations as images that are directly interpretable by humans, but that's about the only application where we can really tell what a deep network is learning.

Edit:

>> All machine learning models are capable of overfitting.

Aye, but deep nets by definition build extremely complex models with millions of parameters and so are especially prone to overfitting. Bias-vs-variance and all that.


Then requiring algorithms to explain their reasoning should give neural networks an edge over other methods, if they are really easier to interpret.


But what do you expect to hear from the algorithm? As in, how "deep" should the explanation go, and what do you do if you don't like the explanation? How do you verify that the explanation is correct?

Edit: I should add that I worked on a similar problem. I can't reveal all the details, but it was about scoring pairs of objects to say how much one object is like the other. At first we used an off-the-shelf system that only spit out abstract values, so you can say "A is like B with a value of 5, but it's like C with a value of 5.17, so A is more like C than like B". The problem was those scores didn't correspond to any explainable metrics, so we switched to a custom system that could spit out a normalized value (0 to 1) and tell you exactly which qualities were alike and how the score was formed. People were NEVER happy with the values. Never. Everyone "felt" that these two should be more alike, or there should always be at least one object that has at least 0.8 likeness to another, etc. Plus, the values were now non-transitive, i.e. A can be 0.9 like B, but B is 0.2 like A. After about a year of tweaking we switched back to a non-explainable system.

Of course, the "non-explainable" system is perfectly explainable, it's a computer program and it isn't even deep learning, it's just long equations, the explanation is just "this long equation results in this value". The semantics of that equation though are very hard to impossible to understand. Plus, I don't think we can have a semantic explanation of a deep learning system precisely because we use deep learning when we can't semantically specify an algorithmic solution to the problem.


But what do you expect to hear from the algorithm? As in, how "deep" should the explanation go, and what do you do if you don't like the explanation? How do you verify that the explanation is correct?

It depends who you are explaining it to, of course, different audiences will have different standards, and will have or have access to, different kinds of experts to validate the explanation.

If the question is "why did your self-driving car hit that pedestrian?" for example, the standard would be pretty high. "It works most of the time but we don't know why it failed this time and we don't know when it might fail again" isn't going to fly. If the question is "we've just busted this trader for insider dealing, and you made the same trades at the same time, why?" then "the computer just told me to for no particular reason" isn't going to go down well either. You would need to show how data that you legitimately had access to reproducibly leads to exactly the same decision being made.

What do you do if you don't like the explanation? That's easy. Someone's going to jail. Don't let that be you :-)


Consider the following case.

Let's say we ban development of self-driving cars since the algos can't be explained. Then we will never know the benefit we might reap from adopting them. We know humans make mistake. But with self-driving cars, the error rates in future might be very low as compared to human. We as a human society will never experience this future because we had this silly idealism that all algos must explain themselves in human terms.


Let's say we ban development of self-driving cars since the algos can't be explained.

That's a false dilemma since it's by no means proven that the algos CAN'T be explained - just that we don't know how to extract the explanation from the model yet. It's entirely reasonable to suspend real-world use until the maths catches up.


But we would never reap the benefits in the meantime. That was my main point. And this becomes crucial if the saving of deaths from accident is substantial as compared to deaths that happen due to AI errors.


> why did your self-driving car hit that pedestrian?

It didn't see him/her. Which is the extent of an explanation a human can give you. Try to picture wanting a person to explain why they didn't see a person coming from the side. We don't know what's happening inside our own heads.

That's my point - we're recreating an inherently non-explainable system in software, because we can't explain how it works. The algorithm is so complicated, we can't code it explicitly, so we let the machine figure out the semantic-free equations. All is not bad though, you could always dry-run the algorithm at home, i.e. ask your car "how visible am I with these clothes on", and just not walk outside with low-visibility and low-recognition clothes. Some countries already require pedestrians to wear reflective clothing in the evening to make themselves more visible. I suspect we'll have similar rules for self-driving vehicles: you must wear recognisable clothing when walking through traffic.

What I mean about "what if you don't like the explanation" is something else. Let's say you use a deep-learning software to determine sentences. The software spits out "5 years, because this is a black male". Do you accept that? It has been proven effective in many previous cases, can you accept that it has independently discovered a correlation between the person being black and the length of the sentence? Let's say you entirely remove the person's colour and sex from the dataset and re-train, then it will give you shorter sentences than you expect, because every judge previously had subconciously taken all these variables into account. Or it will discover a correlation between wearing e.g. adidas and alcoholism because Slavic people have a higher rate of alcoholism and wear adidas more often than others.

Which comes back to my core argument: If you know the acceptable metrics and algorithms, why are you using deep learning? If you don't know the acceptable algorithm, how can you verify the explanation?


It didn't see him/her. Which is the extent of an explanation a human can give you. Try to picture wanting a person to explain why they didn't see a person coming from the side. We don't know what's happening inside our own heads.

There is a world of difference between "this individual driver made a human error" and "software that can't reliably see people is driving a million cars now". The developers of the software would need to demonstrate they understood why the software failed in this situation and how they are going to fix it.

I suspect we'll have similar rules for self-driving vehicles: you must wear recognisable clothing when walking through traffic

Bear in mind that we can't even make cyclists now wear hi-vis or fit lights - making the entire population wear certain things for the benefit of job-destroying technology isn't going to get many votes!

If you know the acceptable metrics and algorithms, why are you using deep learning?

I disagree, because it's a matter of scale and usefulness. An algorithm that detects pictures of cats can answer the question of "why do you think that's a cat?" with "because it looks like a cat" and that's a perfectly reasonable explanation that a human might give[0]. But the stakes are raised when you start to deal with real people in the real world. "Why did you turn down that loan application" or "why did you add that name to the no-fly list" or "why did you reject that job candidate" need a bit more exposition.

That's a self-correcting problem because no-one sane will risk their business on software that "inexplicably" just happens to reject everyone in a protected class because there happens to be some vague correlation between members of that class and some undesirable activity or outcome.

[0] Except http://www.bbc.co.uk/news/technology-33347866 of course


> you must wear recognisable clothing when walking through traffic.

Which would make Driverless cars a non starter in the real world.

Dear God, I can't imagine the poor coder who has to dictate fashion so that car firms can ensure they don't hit people.

That addition reneges on the basic expectations people had of driver less cars in the first place, so now why would they have any motivation to let them happen?


As Cathy O'Neil points out, black-box algos are deciding things that have a huge impact on individuals' lives: which teachers get fired, which prisoners don't get parole, and which loan applicants get approved and denied. In the latter two cases, the algos are tougher on minorities.

It's troubling when we don't know why an algo is racist. We need ways of checking that these influential algos aren't reinforcing trends we would rather diminish.


No one cares about human judges that decide parole or to fire teachers. Humans are vastly worse. Attractive people are much more likely to get hired or get shorter prison sentences. Judges tend to give much harsher sentences just before lunch when they are hungry. Their predictions about what people are likely to reoffend are often worse than chance.

Additionally, the study about the "racist algorithms" was fraudulent. Their results were "almost statistically significant". I.e. not statistically significant. Compared to human judges which have well studied biases by race. There's nothing remotely fair about human judgement and you should always prefer an algorithm. In almost every domain they could find, researchers have found that even simple linear regression beats predictions of human experts.

And this has huge effects on our society. Humans making biased hiring decisions leads to mass discrimination against certain groups and very suboptimal employees. Having humans make loan decisions, means much higher interest rates, more people go bankrupt, and the economy grows much slower.

The EU banning the use of algorithms is just absurd.


Yes, but Humans can go to jail.

You assume that your opponent is reality.

your opponent is other human beings. It takes precious little for a motivated person to learn how to hide malfeasance behind a black box.

Faith in un-corrupted black boxes should be considered the same way as faith in the un-coruptible internet


No one has any incentive to corrupt the black box. There are many, many cases where systems have terrible incentives that ruin everything. But I don't see how this is one of them. E.g. the bank has every incentive to make it's loan system as accurate as possible.


This is a good point, and to her credit, Cathy O'Neil is careful to say that humans are worse. She's arguing for better algos, ones that are carefully thought out to avoid the worst of these pitfalls.


I think he was complaining about the algos' decisions being hard to understand, not them being unfair. From your examples, it seems like we more or less can understand human decisions.


No-fly lists as well.

reinforcing trends we would rather diminish

Well, of course they are, at the end of the day all any predictive algo is doing is extrapolating a trend. It usually requires serious regulatory intervention - such as men being charged more for car insurance because "the computers said so" until legislation was passed barring sex as an input to the model.


The black box nature of the regression can make things difficult, though. If you have a couple other parameters which indicate 'black male' with high probability, you might find layer one reconstructing the notion of gender and layer two making decisions based on it.


Much better to include the feature in the training, and exclude it from the inference. That actually cancels out the effect, rather than just incentivizing the model to reconstruct it.


For your court example, if you have human oversight it doesn't matter. For example, machine finds evidence, human looks at it the found evidence, verifies and explains it, then no problem. Also, I'd like to point out that much "evidence" in court is very far from rock solid. I'm always curious why we hold machines to higher standards than humans.


Elad's pretty clear in laying out how pre-deep learning denoising research is differently elegant compared to CNN, etc. You should also consider the context, SIAM- the applied math community, which has seen a lot of work in deep mathematical connections between PDEs, denoising, etc. The math is certainly more advanced than required for understanding deep learning and applying it to achieve better results.


Thanks for this comment, even if you're just stating the obvious. It's amazing the lack of humility some posters display on HN when it comes to discussing the work of professionals with experience in the field.

(I'm not Elad or affiliated to him or his instutition btw).


I may not have the accolades Elad has, but I'm not a random HN commenter. I speak from own experience in machine learning.


It doesn't matter who you are and what you've done. Some respect to what others have achieved is always a good thing.

Lots of people have "experience of machine learning" these days. And there's so much information out there anyway that you really don't need to know what you're talking about to leave a quick comment on a discussion board.


I think it comes down to politics, which is ultimately about convincing a public. The same public which has caused measles pandemics to rise again because enough of it believed a YouTube video.

One example is in the child services. We've got enough digital historical data from child services, that we're able to build models, that let's us predict whether or not a child will have to be removed from abusive parents 5-10 years from now.

This information is relatively useless though, because there is no way to explain how it works in a sense that the general public will accept.

The public is probably going to adjust eventually, but until that happens, you'll see an increasing push for documentation on how a result was reached.

Ironically an algorithm that explains how it ended up with the results that it did, isn't any more moral than an algorithm that didn't. I mean, you're losing you autonomy and freedom when the government knows what you'll do before you do either way.


>An algorithm that explains ... isn't any more moral

Not sure what "moral" means, but two things:

1. If black boxes make unexplainable decisions, that gives too much power to the person who reads the box's answer,

2. If the child services algorithm explained its prediction involved e.g. alcoholism, then the parents could try to quit.


Unfortunately, you can't quit alcoholism. Also troubling: what if the child service algo happens to predict that brown families are more abusive?


While I have nothing against black-box models and believe that interpretability is over-rated, deep learning is less interpretable than other models. But compare something like a simple linear model with a deep model and the linear model is much easier to interpret, especially for someone without a mathematical background. You don't have to cite papers to give examples of how to interpret a linear model. A single decision tree is also much easier to understand and explain than a deep model. Where deep learning and tree ensemble methods excel is in accuracy and ease of use.


I take issue with precisely the notion that deep learning models are "black-box". They're pretty transparent, and just because people haven't gained adeptness in it yet says more about its cutting-edginess than about its interpretability.

The average programmer at a tech company won't be able to tell us how a particular complex piece of code works, but that doesn't stop us from building complex software.

Deep learning methods are also not off-the-shelf type algorithms. Using them does require knowledge of the domain. This doesn't fit with the "black-box" narrative.

In fact, SVMs and DTs are black-boxes due to their off-the-shelf nature. (jk lol)


I think the black-box nature or argument of deep learning lies in the fact of its parameters, not the architecture. Sure, the operations of deep learning, what they do is defined ahead-of-time, but together we still miss the point why it works under some scenarios but fails in another.

Take the recent discussion on reddit/ml for example, people are still debating about whether it should be conv-bn-relu or conv-relu-bn. This is a pretty widely used building block, if not the most widely used one, however, people still don't understand why the latter could work or even outperform the former in a lot applications since it filters out all negative values thus destroying/skewing the underlying distribution for bn. And for BN alone, there is a lot of questions to ask, like the running statistics feels like a hack, however it works very well in reality.

So I take no issue of calling deep learning nowadays a black box. We are far, very far from understanding why this monster does this well in solving so many problems. That is why it is interesting. Some researchers' attitude is confusing to me, because apparently there is a big juicy problem out there, waiting to be cracked, yet, they are distancing themselves away from it.I cannot help thinking it is out of contrarian, that the fear what they have worked for so long may not be useful after all. But true researchers should feel excited for the opportunity to be able to participate when the theory is still vanilla and contribute to it.


Your example about the debate of BN usage demonstrates that it is possible to look inside a deep network and debate. That we don't know the answers doesn't mean the answers don't exist, or are impossible to find, which is what the term "black-box" suggests.

Of course, more research in tools for model interpretation would be awesome, and my own lab has done a lot towards it, and this remains an important topic. More is desired, but what we have right now is pretty good too, and is not at all inferior to old-school methods, esp. considering the performance.


I'd argue that a neural net is "black-box" in the sense that nobody really can give a coherent answer to "what happens if I perturb/double/negate this parameter" where the parameter might be deep in some weight matrix. Maybe this isn't a useful question because of the distributed representations within neural nets, but it is at least an answerable question for other models.

Do you know of any work on interpreting neural nets that are being used for non-image tasks?


The deep learning buzz-word applies to models which learn feature interactions in a fairly complicated manner. While it is possible to explain what a model is doing, the model is a black-box in the sense that without a computer you could not develop the model. The model is making a lot of decisions, and explaining why all those choices are made is not really feasible. In contrast, I could create a linear model by hand easily, it just won't be as good at complicated predictions. However I think the stigma against black-boxes is undeserved. The human brain is a black-box and no one argues against using it.


The human brain is a black-box and no one argues against using it.

Well, nobody's against using the brain, but the current trend in most subject domains is to avoid overrelying on intuition, and checking and supporting the conclusions using structured thinking methods, i.e. logic.

That is, deep learning is an intuition, and you have to have a high-level explanatory/verification mechanism that would support or reject the answer.


I've been thinking lately that the reason facilities of our minds are simply a reflection of physical reality but only one version of it most useful for surviving at the scales in which we exist and give the "reality problems" we've evolved to solve. The idea that our perceptions indicate some truth about reality becomes easier to discount.

We may not quite be there but in terms of the reasoning facilities of our minds, we seem to have understood nearly all that can be understood about the universe. All the quantum stuff is simply not understandable using all that machinery but seems to require we run a virtual-machine like thing grafting reason onto weird probabilistic models. Given all this, I don't immediately assume the most "elegant" solution is really the best anymore for most of the big problems we're now trying to solve. At least in the domain of science.


The article meant elegant in the sense of "nn training provably converges in subexponential time and doesnt overfit the data". E.g., like the theory for convex SVMs.

Mathematically, nn training is an unproven algorithm, it just works empirically surprisingly well. It's analogous to the simplex method which worked well for years without theoretical justification.


>> Examples of elegance in deep learning: The gating mechanisms of LSTMs (also adds to interpretability);

LSTMS amaze and dumbfound me in equal measures and I admire anyone who can understand them, at all.

Could you please indulge my cluelessness and explain to me what is mathematically elegant about the gating mechanism in LSTMs and how it adds to interpretability?


You might like this article[0]. It definitely helped me develop some intuition for LSTMs.

0: http://r2rt.com/written-memories-understanding-deriving-and-...


Thanks, but I wanted to hear why the OP thinks LSTM gates are elegant and add to interpretability.

The OP made a big todo about how the article above is misguided and the person who wrote it has a poor understanding of deep learning, so I'm assuming he or she has a very good understanding of the subject.

I mean, I'd hate to believe the OP is just playing deep learning bingo for HN karma, you know?


>I question this meme that deep learning has less mathematical elegance or interpretability than other machine learning model. It's simply not true.

I know, right? Like, take this model I trained this morning. Here's the parameters it learned:[0.230948, 0.00000000014134, 0.1039402934, 0.000023001323, 0.00000000000005]

I mean, what's "black-box" about that, really? You can instantly:

(a) See exactly what the model is a representation of.

(b) Figure out what data was used to train it.

(c) Understand the connection between the training data and the learned model.

It's not like the model has reduced a bunch of unfathomably complex numbers to another, equally unfathomable. You can tell exactly what it's doing- and, with some visualisation, it gets even better.Because then it's a curve. Everyone groks curves, right?Right, you guys?

/s obviously.


I don't know what model you trained but if it only needs 5 parameters it's quite possible that you can figure out the corresponding features, and remove your sarcasm tags.


It's perfectly possible to train a deep neural model with millions of features that outputs a vector of a handful of values, or even just a scalar. That's actually one of their strengths.

[Full disclosure: that comment was originally mine from a different thread; I'm not affiliated with the OP in any way.]


Yes indeed. Your comment was sarcastically perfectly succinct and to the point. And it left a mark on me as being the best way to get across this "magic math" stuff encroaching into computer science. It does appear to be a very powerful tool, but with absolutely no introspection or any way to validate or invalidate content. Best one can hope, is that the dataset has no inherent biases ~ but even that's no claim the algo won't create a layer to make the biases, and then directly chose on biases.

I probably should have cited it (was on mobile, but oh well :/ ). Well, I'll do it now: https://news.ycombinator.com/item?id=14219450

If anything, please consider this copy/edit as a sign of respect, and not of mockery :)


No worries. It's all good :)


Are you confusing parameters with outputs?


Well, the numbers you get in the output are function parameters.


I'm starting to think that human resistance to change is a form of regularization.


Can you recommend some good books/papers that explain how exactly the hypothesis space of DNNs looks like? By no free lunch theorem, there has to be one.


> mistaken push Einstein and co. made for deterministic interpretations of quantum mechanics

But really. Secretly we all expect one to exist below it all.



If the internet were around when microwave ovens were invented, there would have been much the same outcry. Deep learning is excessively simple and elegant, I can explain it in person in under 5 minutes to anyone who has done high school algebra.


Great. Explain why training converges in subexponential time despite the nonconvex cost function and does not overfit the data.


As you get closer to the right answer, the changes you need to make are smaller and harder to discover.


You only need to know that if you are studying the field. A good explanation of deep learning is no more surprising to the general populace than general relativity is (spacetime isn't actually the surface of a balloon).


That's not correct though, theres plenty of reasons why on an intuitive level neural networks should not work. Also, genetic algorithms are intuitively appealing but do not work well.


Genetic Algorithms are used in designing of antenna arrays.


Because in high dimensional space there are partial saddle points, not local minima. Intuitively obvious, and shown theoretically in the last 12 months.


Are you referencing https://arxiv.org/abs/1412.0233?


Shown theoretically under fairly artificial assumptions.


You forgot attention mechanisms, that's a huge one


I disagree that his points are due to mere familiarity with "old-school methods". I view elegance as something that is simple in hindsight but only if with an appropriate background. These are simple movements that belie the expertise it took to get there.

Older methods were proof heavy and required more mathematical sophistication but once you understood them, you could easily implement the idea. A lot of deep learning is comparatively mathematically simpler but there is often much incidental detail and folk knowledge that ends up being important but not mentioned. It is in this sense that one can call DL inelegant. A lot of that is because so many are racing to publish results, there's not much time to contrast with prior art or do much more than justify with often very expensive experiments with hastily scrawled descriptions. Lots of ideas generated without much context means many of them are quickly dropped and forgotten, sometimes without sufficient justification.

> Examples of elegance in deep learning

GANs, VAEs, gating, convolutions and weight sharing are indeed great ideas. IIRC Hinton's early papers did inspire Friston's influential work in theoretical neuroscience. However, translation invariance is double counting the advantages of convolutions. Policy gradients, the log-derivative and reparametrization tricks are independent of deep learning. Variational inference is more due to how far reaching that idea is if you want efficient generative models. Game theory...well lots of ideas are connected to it, including evolution and the dead simple weighted majority algorithm. Split brain is really reaching, especially since it's now looking to be one of those ideas that will need a decent amount of revision.

That said, I agree that too much emphasis is placed on interpretability. If a function is too complex, then a human will simply be unable to fit it in their working memory. Nonetheless, the ability to introspect on some of a neural net's decision is vital. As you say, visualizations and mappings which project to a simpler function space but preserve most of the detail are just as possible with neural nets as with any other method.

In this post I might come across as disparaging of deep learning, but this is certainly not my intention. There are lots of excellent papers which elegantly tackle difficult questions. They ask: what invariances and structures do we seek? How do we learn good representations we can sample from? How do we keep learning stable and achieve good gradient flow? How can we capture longer range correlations? These also offer insights that are not limited to deep learning. You just don't hear as much about them because they are not as shiny.

And if you're looking for mathematical elegance that cuts to the heart of the matter, specially with the rising importance of generative models, you could hardly do worse than start from anything written by Shun-Ichi Amari.

People always want to jump right to the shiny stuff and skip the basics, this is understandable but suboptimal in the long term, in any field. But a good compromise on this matter is the excellent free deep learning book by Goodfellow, Bengio and Courville.


This is an interesting article. It made me think of the socio-political consequences of the move towards neural networks.

Many of us, children of the personal computer revolution, were attracted to computers for its empowering and democratizing effects. Just with a computer you could build anything! You are as powerful as any of "them"! The proprietary software model was developed to monetize end consumers by enchaining them, but the free software model and the fertile communities of the early Internet showed that the model of personal computation could not simply be displaced by legalese.

But if we take the author's position on the impact of deep networks, the model of computation is changing again.

If teaching deep networks is the new way to write useful programs, our brains and personal computers are not enough... they are obsolete. We now need clusters and, most importantly, massive access to data! This gives extraordinary leverage to the hoarders of data and servers of the Internet, the googles and the facebooks.

I'd like to think that maybe we are just at the beginning, the early "mainframe neural networks" era. That we just have to wait until there is enough technology and new markets discovered to build the "personal neural network". That consensual, open and distributed ways of sharing data will emerge and that new massively parallel computers will become affordable by the masses. That the models of neural networks will become well divulged and simplified and kids will be able to program them with their "Neural Basic"...

But at the moment the prospects don't look good. The Internet is becoming more and more centralized, personal computers harder and harder to program and there is a general "war on general computation" [1]. Even universities seem to bedisplaced by the Internet Lords in driving neural network development...

Maybe it is already a good time to start thinking about what "Libre Neural Networks" look like. And how can we get there.

[1] https://www.youtube.com/watch?v=HUEvRyemKSg


There seems to be a lot of angst in the responses about how deep nets et al "are too elegant; are too interpretable!".

It's important to remember that the intended product of science is not tools, it is understanding. Deep nets may produce very elegant and interpretable tools, from a very elegant and interpretable theory, but that is not what the scientists are looking for. They are generally creating and assessing theories in their subject domains, which are then evaluated by the tools they have built.

Deep learning does do a great job of providing a baseline ("your theory must be this accurate to matter"), but it seems to do a much less great job at extracting new understanding.


Machine learning has departed from the venture of understanding-based science much before deep learning [1][2]. Thus, singling out deep learning reeks of motivated reasoning or just lack of awareness of the history of machine learning.

Having said that, there still exists the lunatic fringe (it's a compliment) inside deep learning who continue to work on the task of generative modeling in the hope of "understanding the world" rather than "solving a task". Yoshua Bengio and Yann LeCun don't miss an opportunity to impress on the world how important unsupervised learning is.

Physics too has abandoned understanding for predictive power. If you take philosophy of science seriously, you either have to pick "science is whatever the elite society believes in (Kuhn)" or "science gives us tools for prediction (Post-Kuhn)". Claiming that science helps in finding truth or understanding, is going to put you in a very indefensible slippery slope.

"Understanding" is a subjective human concept, and not worth pursuing. Humans evolved to run from tigers, and struggle with harder tasks like understanding quantum mechanics or understanding the brain. The directionality of scientific progress as evidenced by QM is not "understanding", but tooling and predictive power. I hope Deep Learning is going to follow the path of predictive power, rather than regress into the pseudoscientific mess of elegance and understanding.

[1]: http://www2.math.uu.se/~thulin/mm/breiman.pdf

[2]: http://norvig.com/chomsky.html


"Understanding" is a subjective human concept, and not worth pursuing."

Isn't human understanding of how our environment works precisely how you're able to sit in your chair and write comments on HN for others to read?


Sure but I might say that if I were to create a chair by generative methods, would I need to understand how it was created for it to be useful?


If you have ever crafted a chair, table, or anything else that is built by craftsperson, you will soon realise why such a profession exists, because it's very difficult.

You might slap a few pieces of wood together and by brute force, create something like a chair and that might do, but it's very unlikely to be a good, safe, attractive, comfortable long lasting chair. Building quality furniture is complex and difficult work which takes a lot of skill, experience and understanding in order to do.

I understand your point, I never said there is anything wrong with experimenting with NNs, I just don't agree with the OPs sentiment.


No. Tools enabled it, not understanding.


Give one specific example of a widespread modern invention that did not require any understanding to create. You personally can use a tool without understanding how it's made, but you seem to claim the tools are invented without anyone's understanding.


[flagged]


Noone understands quantum mechanics but you will tell this scientistic community what quantum mechanics is, because you know what it is even if noone understands it.

Sorry you're being downvoted but you're the one who's just throwing buzzwords around that you poorly understand. That's the definition of scientism, right there.


I don't understand QM, and no self-respecting physicist claims to understand it either. The consensus is pretty strong on this. QM is strange, the Standard Model is strange, but it works. I'm kinda surprised people like you are not aware of this simple fact. We don't live in 1700s anymore, and aren't in the world of high-school physics. Things are strange now, and the data-backed equations have replaced understanding. I can understand that people want science to explain things, but that's simply not the reality any more.


OK, seriously. How are you in a position to know what is the consensus among quantum phycisists, or anything much about quantum mechanics? Are you a quantum phycisist? A soft eng at CERN?

And what exactly do you mean "people like me"? Please don't do that.


I listen to public talks by people like Max Tegmark, Lawrence Krause, etc.


That's not how you figure out what the consensus is. At best, you may get an idea of what the person speaking believes (or wants you to believe) the consensus is, but researchers always have their own opinions and their own agenda so you can't really rely on talks and articles in the lay press.

To be able to say that you know what the consensus is in any given field the gold standard must be what scientists in that field do: read about 250 papers each year for a few years, speak to as many other researchers as possible and go to a few conferences and workshops, until you really understand were things are and how they are moving along.

I appreciate that's a high standard, but that's the general idea. If you can't do that, then at least be aware of the fact that the opinion you have formed is not terribly informed and you 're not any kind of authority on the matter.


>> Physics too has abandoned understanding for predictive power. If you take philosophy of science seriously, you either have to pick "science is whatever the elite society believes in (Kuhn)" or "science gives us tools for prediction (Post-Kuhn)". Claiming that science helps in finding truth or understanding, is going to put you in a very indefensible slippery slope.

This may be threadomancy, but I kept thinking about your comments and I realise now that there is a huge confusion between the different ways that the word "prediction" is used in the sciences.

What you mean when you say that you can use quantum theory to "make predictions" is that you can plug in some values to the formulae and come up with other values- the position or speed of particles and so on. This is a "prediction" in terms of stochastic determination of the state of the world within a probabilistic framework: you "predict" the values that some random variables will take.

On the other hand, what is more commonly meant by "prediction" in the sciences is the ability to anticipate novel observations, phenomena that have not been observed yet. For a famous example, phycisists and mathematicians predicted the existence of black holes before those were observed by astronomers.

This ability to foresee as-yet unseen phenomena is the true power of scientific theories, the context in which "prediction" is most often used in the sciences, and something that cannot be achieved without a thorough understanding of said theories. You can't take a black-box model of pictures of dog breeds and make a guess about what other kinds of dog breeds might exist, or might be created, than the ones in the original training set. The model may even be capable of doing that- but you, the person training it, are none the wiser. You can't use the model to expand your knowledge about the world. It's a closed loop.

So it's not really "predicting" anything in the way you say it, the way Kuhn meant it or anyone else means it.


It is fascinating to see in real life, how science-fiction style AI is becoming a reality, where nobody understands how machines produce their answer. Given a few more decades, I'm sure the Asimov short story - The Feeling of Power - will become culturally relevant.


I echo the feelings of the author of this piece, and I'm very wary of a world where predictive power always trumps deeper understanding.

(Edit: I knew I'd expressed this feeling before: https://twitter.com/copingbear/status/825098385548009472)


I have no doubt that some day (soon?) neural networks will be able to explain themselves (using language). But the explanation they give will be less than satisfactory to our aesthetic criteria. Occam's razor is a great aesthetic rule, but it doesn't guarantee beauty. We will most likely find out that the computational vision of the past had managed to find some rules covering some cases, but it missed many more, most of which will be higher-order and not that elegant. And it may turn out denoising was a rather messy problem, as was probably speech recognition.

Drugs work but we don't know how a surprisingly large number of them work. Yet chemists don't complain about them. Theoreticians should not complain either. Deep nets give them a huge subject matter that probably hides many interesting insights in it. I mean, don't convolutional layers look like gabor filters?

The question of what can be understood from them can go deep, to the limits of the ability of language/math to express ideas.


I don't see a problem in splitting all these domains to an "applied" school, that focuses research into evolving solutions that currently seem effective in approximating answers for certain engineering problems, and a more theoretical school that has the potential to carve new revolutionary paths while pursuing some (abstract?) ideal such as mathematical elegance. One shouldn't exclude the other and apparently there are enough scientists around that are attracted to either one of the two approaches.


Deep learning/neural networks -- as well as a number of other "machine learning" methods -- is fitting a mathematical model with a huge number of fitted (tunable) parameters and component functions to data. It has been known for a long time -- arguably back to the discovery of the Taylor series expansion in the early days of calculus -- that any function can be approximated arbitrarily well by the composition of an arbitrarily large composition of other functions.

If the task is interpolation between the data points this can be highly accurate. It the task is extrapolation such as prediction or designing a truly new machine or system in an engineering application, the approximation will often fail. The about one percent error rate in predictions of planetary motions from epicycles is one of the earliest cases of this problem.

The simple example is approximating data with a polynomial with an arbitrary number of terms such as in the Taylor series expansion. With enough terms a polynomial model can always approximate any data set arbitrarily well. Polynomial models can interpolate very well unless the data has some generally unusual behavior -- changing unpredictably at successively finer scales for example. However, finite polynomial models almost always extrapolate grossly incorrectly. As you move away from the data set in the space of independent variables such as X, the largest power N in X^N dominates and the polynomial approximation function blows up to either plus or minus infinity which is rarely physical.

What we think of as "understanding" corresponds conceptually in part to the ability to make accurate predictions. "Understanding" or "explanation" corresponds mathematically not to some arbitrary super-complex function with large numbers of arbitrary parameters but rather to a mathematical object such as as system of differential equations (e.g. Maxwell's Equations for Electromagnetism or the General Theory of Relativity) that express interrelationships among the variables and data points.


Machine learning researchers are well aware of the problem of out-of-sample predictions, in fact most of the work goes into making systems appropriately generalizable by making architectures robust through dropout or regularization. The resulting systems might not correspond to our intuitive definition of "understanding", but they look more similar to what we know about the human brain than an elegant function with a small number of interpretable parameters.


First comment I make on HN, but I feel compelled to say something. This post reeks of the author piping on about the "good old days" of mathematical elegance.

Firstly, mathematical elegance is not an objective metric by any means. Many mathematicians disagree on what results are most elegant, though there are many commonalities as well.

Secondly, all of machine learning is quite elegant. I'm by no means an expert since I only took two courses in my fourth year that were surveys of many methods (quite mathematically rigorous btw) but I've taken enough math to be able to judge what is elegant or not.

As the author himself says, he has slightly modified his research methods. These days you'd be a fool to ignore deep learning, whether you have a deep understanding of it or not.


I felt the article was quite fair in acknowledging the intriguing results of deep learning, and decidedly arguing against ignoring them.

"Elegance" can indeed not be defined, but I believe the misunderstanding here is more about the point of view: Deep neural networks are plenty elegant when looked at from an ML POV. But, if you're an expert in, for example, image processing, using a DNN as a tool, the solution it may give you won't be "elegant" by anybody's definition. It is, after all, a repetitive formula with X million arbitrary floats.

Previous ML methods usually resulted in models, or formulas, that were small enough to "grasp" intuitively. The "beauty", or "elegance" was that often, you could find connections from terms in your model to the real world.


> Previous ML methods usually resulted in models, or formulas, that were small enough to "grasp" intuitively.

Which is not true in practice. SVM in reality is an instance based method, with thousands to 10 thousands high-dimensional vectors as its 'parameters'. Random forests, which is famous for its ease of interpretability is no easy meal either when you end up with 100s of trees with outrageous branches. Not mention that real world model are embarrassingly complex ensemble of smaller but still complex models.


Deep learning is a perfectly useful engineering solution.

As a tool for basic research, particularly in the biomedical realm, it's awful. You get a system that performs pretty well most of the time, but tells you nothing of interest.

That's fine, as far as it goes, but "performs pretty well" is so much more useful in industry than "we know how X biological system works" that entire fields that are interested in the mechanics of how perceptual biological systems work and may be fixed are getting eaten alive. To our collective detriment, I think.


Author mentions style transfer as a product of NN research.

Yet this paper [1] from 2014 works extremely well, no neural networks. Here's one from 2008 [2] that worked extremely well too, better than most early NN techniques.

Although, deep learning is far from the intuitive approach that existed 10 years ago. It's clearer now how to reason about models, layers, activation functions etc. As much as there's no mathematical foundation, I do not believe it helps that much in the case of SVMs or whatever else. There still has to be experimentation, approximation and proper testing.

[1]: https://dspace.mit.edu/handle/1721.1/100018 [2]: http://link.springer.com/chapter/10.1007%2F978-3-540-88690-7...


It's important to note that we know the limitations of SVMs because of the theory. That's usually what a theory provides.

There are also a few methods, like boosting, that are purely the result of theory.


To claim that those techniques are even in the same ballpark as deep learning based style transfer is just silly. They are extremely limited in their application, as you can clearly see by looking at the examples. And the entire approaches are overfit to the particular applications that they show, so they can't be readily extended.

These are actually excellent examples of how deep learning has totally changed our expectations of what can be done with computer vision by using deep learning.


Deep learning based style transfer that got viral a year or two ago was worse than this 2008 result, especially worse for that hone application to portraits.

The hype was nowhere near just because there's no deepness.

I admit that https://arxiv.org/abs/1705.01088 this beats everything and is extremely powerful and simple but the hype for deep things seems a bit too strong.


> To put it bluntly, your grandchild is likely to have a robot spouse.

I'll take the other side of that bet.


I was already on the fence about the article being standard deep learning click bait when I saw that :)


That your grandchild is likely to be a robot spouse? ;)


You don't think that follows easily, then?


In my opinion, this push towards "mathematical elegance " and "interpretability" is hurting us and is unscientific. About interpretability, I'll quote Yann LeCun here, you don't really care too much about whether or not your taxi driver is interpretible, he's a black box. Not only that, but there are a variety of new methods such as attention mechanisms, gradient masks and auxiliary explainer networks that give really good interpretability. And it's funny how this criticism never comes up for SVMs...

I also think it's unscientific to expect mathematical elegance from everything. Math is ultimately a logical descriptor tool for AI, a means, not an end in itself. Besides, there's actually a host of involved mathematics underlying the seemingly simple SGD of deep nets. For example, tropical geometry has been used to analyze the loss surfaces of ReLU networks and random matrix theory has been employed to analyze the loss surfaces with respect to the quality of local minima.

Tl;dr: deep nets are far more interpretible than they've been given credit for and they're also more mathematical than some (such as this author ) would have you believe.


Good Computerphile episode on understanding what is going inside a neural net, https://www.youtube.com/watch?v=BFdMrDOx_CM . Looking at high information neurons to see what they pick up can yield a lot of insight into the problem.


But this is true of neural nets as well, so I'm not sure why the author was trying to contrast them. Essentially it is a question of do you want a system that works well but that no-one understands or one that works not so well but is understandable? But if you meta it, it's the same.


I don't think even this dichotomy is real. Deep learning methods are both performant and elegant.


They're elegant only if the sentence 'consider a model with a billion features' doesn't make you feel disgusted, I'm afraid.


elegant, adjective: (of a scientific theory or solution to a problem) pleasingly ingenious and simple.

One easily-described approach that is state-of-the-art in a massive number of domains? That's pretty f'ing elegant.


I sure hope you're not similarly disgusted by "consider a computer with a billion transistors".

Computer Science is full of abstraction and complexity. Denying it to machine learning seems conservative and naive at best.


Many methods in machine learning and statistics benefit from two types of theory:

1) Optimization theory - which says, if I repeat this iterative method N times I will find a (nearly) globally optimal solution

2) Statistical theory - which says, if I observe this process N times I can accurately estimate a population quantity with high probability

Deep learning does not benefit from the same theoretical guarantees.

For the most part the response from the community is "but it works really well!" which is a fair and valid response especially since what most practitioners care about is predictive accuracy.

Personally, I find applying neural networks extremely annoying at times due to the amount of twiddling of hyperparameters, slow convergence, etc.


" The facts speak loudly for themselves; in most cases, deep learning-based solutions lack mathematical elegance and offer very little interpretability of the found solution or understanding of the underlying phenomena."

I don't agree with this statement, a simple look at cs231n lecture series will show you how much math is involved. A lot of articles/people claim its a black box, but while writing a small network architecture you realise it's not. Methods like stride, padding, activations, learning rate, optimisations, drop outs etc. give you "aha" moment which is followed by a mathematical explanation. One should study the topic thoroughly before criticising it.


Having a lot of math involved does not mean that it is mathematically elegant. As a mathematician, I ask myself several questions. First of all, what is really a neural network? Is it an approximating function? Is it a geometric separation on a space, such as SVM? Is it a manifold classificator? (see http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/, which is very interesting)

Also, what are we approximating? Continuous functions? Non-continuous functions? Are they even functions and not probability measures? Are those functions arbitrary or do they represent something like a manifold?

And the most important: how well are we approximating whatever we want to approximate? The universal approximation theorem gives uniform convergence for measurable functions, but do not specify at which rate or depending on which parameters. It is a strong theorem but not that surprising from the mathematical standpoint, where you already know that you can approximate any function by continuous, compactly supported functions.

Finally, how do you mathematically define the problems that arise in neural networks? What is overfitting? How does the learning algorithm affect the results?

The fact that some techniques are justified by mathematical explanations does not mean that it is mathematically elegant. For it to be mathematically elegant you should have at least clear definitions of the objects of study and the problems you want to solve. I don't think this is the case in neural networks.


"Having a lot of math involved does not mean that it is mathematically elegant" I agree to your point, but at the same time how much effort is being made to understand the intricacies of deep learning is my question. In my opinion, those who dont understand these techniques bluntly say its a black box but its not entirely a black box. I am confident if more reearchers start to peel it off layer by layer , a lot more insights will be generated given the field is relatively new .


"Interpretability", in this case, is concerned with the problem domain, not ML itself.

A hypothetical example: A trained decision tree algorithm for a medical decision may place the gender, or the age, of the patient at the root. That lends itself to a quick interpretation as to the relevance of factors for treatment, whereas with a Neural Net, you'll get millions of arbitrary floats that do not impart any meaning just by looking at them.

That's not to say NNs don't sometimes give tantalising insights, as the article points out. I've seen a view visualisations of generative character models that were fascinating–such as finding individual neutrons tracking sentence length or nesting depth. Same for some interesting patterns emerging in the intermediate layers of object recognition networks: oh, I never knew ears were so important for face recognition.


There is no understanding of why NN training converges in reasonable time and does not over/underfit. (Unlike for say convex SVMs)


Not sure why I am downvoted to put my view upfront


> deep learning-based solutions lack mathematical elegance and offer very little interpretability of the found solution or understanding of the underlying phenomena

I disagree. Elegance and interpretability comes with one big flaw: assumption, lots of them. While deep learning based methods assume few or not. If they are so elegant why would them underperform? Probably it is because the problem we are trying to solve doesn't follow the assumptions, like convexity, etc.

In that sense, one can say deep learning methods are even more elegant, because they can work end-2-end. In that sense, NN can be a blessing, because now we got the answer, we don't need to assume anything, but just to DECODE it.


There is a no free lunch theorem in learning, which says that without limiting your hypothesis space, you cannot learn. That means, you need to have some assumptions.

No doubt that DNNs are subject to the same theorem, the only question is, how does their hypothesis space looks like? Does anyone have idea about this? I suspect we don't really know what the DNNs assumptions are.


Agree with you. I think what I need to make it more clear, is DNN puts little assumption on data. It usually consumes it as raw pixels, in case of images.

The architecture of NN itself is the biggest assumption here, as it is how it will be used to process the data. But they those operators are usually generic, a lot of them just umbrella operators for matrix multiplications + non-linearity in between.


I wrote a bit about the theoretical reasons DNNs work here: http://lesswrong.com/lw/m9p/approximating_solomonoff_inducti... There's some more interesting discussion in the comments there.


The DNNs assumptions are "The true model can be fit by this network architecture". If you use convolutions, then an additional assumption is translation invariance.


> The DNNs assumptions are "The true model can be fit by this network architecture"

Yeah, but that's tautological. What I mean can somebody sit down and for a given DNN architecture, write down (at least approximately) the set of functions that it can learn? Or more importantly, what functions it cannot learn? Or at least, how many bits are assumed and how many bits have to be learned?

I think that is what bothers people about DNNs. I personally think they are sometimes even inefficient - we are learning them much more parameters (bits) than the actual hypothesis space requires.


> What I mean can somebody sit down and for a given DNN architecture, write down (at least approximately) the set of functions that it can learn?

For a two-layer architecture with ReLU activations and n units in the hidden layer, this is the set of piecewise linear continuous functions with n kinks.


How many of those do you need to tell the difference between cats and dogs in random internet pictures?


Well here's an argument for interpretability: how do you know when your model will fail?

One tradeoff might be that with deep learning, it's unlikely to fail, but you won't know when it will. With conventional methods, you may not get as good of a performance in comparison, but have a better sense of the domain of applicability because you have a better understanding of the feature space (for instance, you can determine similarity of new data in this feature space when it arrives to determine whether it is similar to past data your model has been trained on).


You can do the same with deep learning, your model will fail when the train data and test data are systematically different. This isn't about understanding the model, it is about understanding the data. Understanding the data is much more important than interpreting the model. Systematic uncertainty is a problem for all models.


Yes - but how do you know when your test data and training data are different in the way that the model cares? With conventional methods, you can assess similarity by distances in the feature space of the model, or you have a physical understanding of why the model works than you have a better intuition about the differences in data that are important.


> because they can work end-2-end

Agree. Not just end-2-end, but the same model can be applied to many different tasks. DeepMind's recent papers make progress in image generation, audio generation, language modeling & machine translation using the same basic construct of a masked dilated shifted convolutions.


> In its feed-forward architecture, layers of perceptrons—also referred to as neurons—first perform weighted averaging of their inputs, followed by nonlinearities such as a sigmoid or rectified-linear curves.

I have a really hard time keeping reading after this passage.


You can do regression

Exam score= cat_a + cont_b + err

but how do you say something like cf male female has 10 score advantage and family income impact is ...

Can Galileo invent physics by doing nn


> "But what about us scientists? What is the true objective behind the vast effort that we invested in the image denoising problem?"

This is the telegraph operator lamenting all the time spent learning Morse Code when the telephone was invented.

I think scientists can be motivated by looking for new ways to solve real problems. Don't weep over the time spent understanding the classic models. Rejoice that we have a much better tool to further mankind.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: