Note: in the time it took me to write this ramble and also the code review I had to do in between writing and posting, others like catnaroek have addressed similar themes
Speculating wildly here, but part of why I think there is continued (albeit unconscious) resistance to other paradigms is that the procedural/imperative paradigm seems to be the most `hackable' of the bunch.
I'm not using `hackable' in a positive sense here. Procedural code is bare-level, almost the machine code of programming models. "First do this thing, then do this thing, then do this thing", etc etc. There's a relatively small implicit underlying `physics'[1] in a procedural system. In some sense, every procedural program involves the creation of some higher-order logic (a model) as a part of its construction, in order to both define the state machine, and dictate the manner in which state flows through it.
The trick, of course, is that every model has a bias, encodes some manner of thinking. An aspect of that is that it is easier to represent some things and harder to represent others[2]. In a procedural program, when the model you've (consciously or not) developed fails, it's trivial to 'fall out' of the model and revert to writing base-level imperative code, hacking your way to a solution.
Functional and other Declarative paradigms, on the other hand, have a stronger, more complex `physics' to them. In the case of declarative, for example, a user is asked only to declare the state machine; the execution of it is left to the `physics' of the system. This can mean that a well-written program in a functional or declarative language appears to be simpler, more elegant. In reality, this is largely because a large set of assumptions that would need to be explicitly declared in a procedural language have been encoded directly into the language itself in the functional/declarative case[3].
This means that when you're operating withing the paradigm that the functional/declarative language affords, everything is smooth, beautiful, elegant, and verifiable according to the physics of the system. However, it's much harder to 'fall out' of the assumed model, because the granularity required to do so isn't at the native resolution of the language.
---
[1] By physics, I mean a set of assumptions one needs to make about the behavior of the system that are not directly encoded by the user's input
[2] Think of how easy it is to, say, pick up a piece of steak with a fork, and how hard it is to scoop soup with the same implement. Different models have different affordances, recommend different problems, or different solutions to the same problem.
[3] As the Church-Turing thesis shows us, these systems -are- equivalent in power, so those semantics still have to be somewhere. To paraphrase Hofstader, there are two kinds of music-playing systems (with a continuum in between). On one end, a system with a different record for each song, and a single record player capable of playing all records. On the other a system with a single records, and a different record player for each song. The difference is how much semantic weight you put on the encoding, and how much you put on the decoding (but they still need to sum to 1)
Speculating wildly here, but part of why I think there is continued (albeit unconscious) resistance to other paradigms is that the procedural/imperative paradigm seems to be the most `hackable' of the bunch.
I'm not using `hackable' in a positive sense here. Procedural code is bare-level, almost the machine code of programming models. "First do this thing, then do this thing, then do this thing", etc etc. There's a relatively small implicit underlying `physics'[1] in a procedural system. In some sense, every procedural program involves the creation of some higher-order logic (a model) as a part of its construction, in order to both define the state machine, and dictate the manner in which state flows through it.
The trick, of course, is that every model has a bias, encodes some manner of thinking. An aspect of that is that it is easier to represent some things and harder to represent others[2]. In a procedural program, when the model you've (consciously or not) developed fails, it's trivial to 'fall out' of the model and revert to writing base-level imperative code, hacking your way to a solution.
Functional and other Declarative paradigms, on the other hand, have a stronger, more complex `physics' to them. In the case of declarative, for example, a user is asked only to declare the state machine; the execution of it is left to the `physics' of the system. This can mean that a well-written program in a functional or declarative language appears to be simpler, more elegant. In reality, this is largely because a large set of assumptions that would need to be explicitly declared in a procedural language have been encoded directly into the language itself in the functional/declarative case[3].
This means that when you're operating withing the paradigm that the functional/declarative language affords, everything is smooth, beautiful, elegant, and verifiable according to the physics of the system. However, it's much harder to 'fall out' of the assumed model, because the granularity required to do so isn't at the native resolution of the language.
---
[1] By physics, I mean a set of assumptions one needs to make about the behavior of the system that are not directly encoded by the user's input
[2] Think of how easy it is to, say, pick up a piece of steak with a fork, and how hard it is to scoop soup with the same implement. Different models have different affordances, recommend different problems, or different solutions to the same problem.
[3] As the Church-Turing thesis shows us, these systems -are- equivalent in power, so those semantics still have to be somewhere. To paraphrase Hofstader, there are two kinds of music-playing systems (with a continuum in between). On one end, a system with a different record for each song, and a single record player capable of playing all records. On the other a system with a single records, and a different record player for each song. The difference is how much semantic weight you put on the encoding, and how much you put on the decoding (but they still need to sum to 1)