It is the other way around. Neural-type models have long passed the point where ...

It is the other way around.

Neural-type models have long passed the point where markov chains made any sense by many orders of magnitude.

Markov models fail by being too opinionated about the style of compute.

In contrast, a linear tensor + non-linear function has incredible flexibility to transform the topology of information. Given large enough tensors, two such layers, with recurrence, can learn any mapping, static or dynamical. No priors (other than massive compute) needed.

All other neural architectures then are simply sparser arrangements, that bring compute demands down. Where the sparseness is fit to the type of problem.

Sparseness can be deeper but narrower information flows (thus “deep” learning). Or in lower numbers of weights to weight application (I.e. shared weights, like convolutions).