> If you're new to learning about regression and want to use it, I would prioritize the statistical issues.
There is absolutely no need to take the statistical route in least squares regression. In fact, it needlessly complicates things. Interpreting least squares regression as the minimization of the sum of squares of the residual is a straight-forward way to understand the technique.
I'm not sure what you mean, but I think we're talking past each other.
Regression is a statistical term. If it's going to be used for an application involving data and real decisions, not understanding the statistical issues is irresponsible. That's not at odds with viewing it as a procedure to minimize square errors.
To clarify: There are probabilistic models (involving Gaussians and stuff) where some interpretations suggest least-squares estimation. That wasn't the distinction I was trying to make. Statisticians who hate probabilistic models will still say that it's important to think about what you put in X and y before you do X\y, and to be careful about what you can conclude from the result.
Statisticians who hate probabilistic models might want to find another line of work. Like, say, philosophy.
You use the phrase "involving Gaussians and stuff" to "clarify," and you ask others for rigor? Physician, heal thyself! And please don't speak for statisticians, they are a peculiar bunch. Ask the same question of 2 statisticians and you may well end up with 3 answers.
> Regression is a statistical term. If it's going to be used for an application involving data and real decisions, not understanding the statistical issues is irresponsible.
No, it isn't.
It's curve fitting. The goal is to fit a curve to the data.
In any engineering application, no one cares what's the statistical interpretation. They only want to fit a curve to the data, and do it in an optimal way.
Hence, you get the sum of squares of the difference between the curve and the sample point (i.e., the residual) and minimize it by determining the parameters which minimize the regression.
That's a particularly incompetent and irresponsible view of engineering.
The key to modeling anything correctly is to be aware of exactly what assumptions your choice of model corresponds to and how well they match the real-world processes underlying your data.
So you've fitted some curve, but why that curve? What are the implications of assuming linearity between your input and output domains? What have you assumed about the distribution of noise over your predictions? How about over your input features? How would you expect the bias and variance of your predictions to degrade if any of these assumptions no longer held?
As an engineer, if you don't understand the statistics behind your models enough to answer to such questions, their real-world applications aren't going to go far beyond wishful garbage-in-garbage-out number crunching.
> So you've fitted some curve, but why that curve?
Because it's the curve that's obtained by that particular minimization criteria, which is the minimization of the L² norm.
If some other criteria was used, or other approximation function, then the result would also be valid.
It appears that you are unaware that essentially all engineering in general, and whole field of computational mechanics in particular, is founded on what can be described as curve fitting. Whether it's plain old least squares approximation (in particular, moving least squares) or other techniques focused on the minimization of some other norm (Galerkin-type methods, for instance) the basis is all the same.
> What are the implications of assuming linearity between your input and output domains?
In short, because analytic functions and Taylor's theorem exist.
Is it that hard?
> As an engineer, if you don't understand the statistics behind your models enough to answer to such questions, their real-world applications aren't going to go far beyond wishful garbage-in-garbage-out number crunching.
You know nothing about engineering, and somehow you're assuming that everything can only be valid if it's interpreted as a statistical problem.
This isn't true, and it ignores complete knowledge field in physics and mathematics.
I agree: If you have data X and outputs y and must have a linear function of X to minimize square error on y and only y, then yes we have defined our task.
I think it's a lucky engineer that has their end-goal defined as such a crisp mathematical task. Defining what is an optimal way to do a fit, for the surrounding application, usually requires thought. Often the reasonableness of the curve does matter, for example if it will be evaluated in locations other than the training data. Also, there may be some choice in how to set up the problem. For example, maybe we're allowed to transform the inputs with some fixed non-linear functions to create extra features. Adding these extra features to the design matrix will improve fit, so an engineer might consider it, but they'll need to be careful about over-fitting if they care about a surrounding application.
There is absolutely no need to take the statistical route in least squares regression. In fact, it needlessly complicates things. Interpreting least squares regression as the minimization of the sum of squares of the residual is a straight-forward way to understand the technique.