You may "disagree" with this definition, but it's one that is commonly used by computer scientists. So it's a definition that is useful to know if you plan on doing or studying any research in this field. For example, see section 4.3 of this paper by Peter van Roy (author of the well-known Concepts, Techniques, and Models of Computer Programming):
"Concurrency should not be confused with parallelism. Concurrency is a language concept and parallelism is a hardware concept. Two parts are parallel if they execute simultaneously on multiple processors. Concurrency and parallelism are orthogonal: it is possible to run concurrent programs on a single processor (using preemptive scheduling and time slices) and to run sequential programs on multiple processors (by parallelizing the calculations)."
And no, languages with side-effects cannot be automatically parallelized without changing the semantics and introducing observable nondeterminism - not nearly to the same extent as pure languages. The examples I've seen essentially work for side-effect-free subsets, which kind of misses the point. Your C compiler may be able to vectorize a loop of arithmetic operations, but as soon as your code includes an external function call the compiler has know way of knowing whether that function performs I/O or modifies global variables, so it cannot safely make parallel calls to that function.
languages with side-effects cannot be automatically parallelized without changing the semantics and introducing observable nondeterminism
Considering I spent years working on automatic parallelizing compilers for Fortran and C++, I'd say it is possible. I don't know why you consider the heavily computational kernels of C programs less interesting just because they don't have side effects or make IO calls.
For instance I worked on some algorithms to parallelize linked-list traversals, something that you would normally think is by definition serial.
I didn't say that side-effect-free program kernels are "less interesting" - I said they were beside the point of this post. (But I do think they're very cool and important in their own right!)
The GHC developers' point was that restricted side effects are necessary for automatic parallelization. Your answer seems to be, "I can parallelize languages with unrestricted side-effects... in sections where you restrict yourself to a side-effect-free subset."
Yes, I'm happy when a C++ compiler automatically optimizes my inner loop. But I'd rather be able to use "par" at the highest level of my program, and parallelize anything from XML parsing to physics simulation, without worrying that some library call ten functions down the stack will turn out to be non-reentrant.
"Traversing a linked list is inherently serial. [Theorists will point out that traversal can be done in parallel if you have a processor per node in the list. Feel free to buy that many Intel processors -- I own Intel stock.]"
One possible implementation would be to use a skip list where you have each core search each level.
The hardware itself automatically parallelizes execution of your program to some degree. When you say something like:
int i = 0, j = 1, k = 2;
The processor itself can execute these in parallel if there are available resources, because writes of literals to registers don't affect each other.
So even if parallelizing compilers only work for side-effect free portions of the language, that is still a decent portion of most code. If it weren't, then out-of-order execution wouldn't give much benefit, and all cpu's would have only one execution unit (instead of 3-4) - after all it only works on side-effect free portions too (more or less).
This is on a different level (fine grained instruction level vs. coarse grained module/function level) but I'm guessing the same arguments can still apply.
http://lambda-the-ultimate.org/node/3465
Here's van Roy's definition:
"Concurrency should not be confused with parallelism. Concurrency is a language concept and parallelism is a hardware concept. Two parts are parallel if they execute simultaneously on multiple processors. Concurrency and parallelism are orthogonal: it is possible to run concurrent programs on a single processor (using preemptive scheduling and time slices) and to run sequential programs on multiple processors (by parallelizing the calculations)."
And no, languages with side-effects cannot be automatically parallelized without changing the semantics and introducing observable nondeterminism - not nearly to the same extent as pure languages. The examples I've seen essentially work for side-effect-free subsets, which kind of misses the point. Your C compiler may be able to vectorize a loop of arithmetic operations, but as soon as your code includes an external function call the compiler has know way of knowing whether that function performs I/O or modifies global variables, so it cannot safely make parallel calls to that function.