Agreed. Optimizing the code without first fixing the asymptotic performance of the algorithm detracts significantly from the article. I've solved this problem before in Project Euler, in both Haskell and C++, and for my purposes as a beginning Haskell programmer, figuring out how to construct an asymptotically efficient algorithm for a Dynamic Programming problem in Haskell is what is interesting here.
I know what you mean but it's early in the morning here and I feel like being a bit pedantic: As far as we can tell neither of the algorithms actually terminates for arbitrary inputs. If we could show that, we'd have proven the Collatz conjecture. Therefore the notion of asymptotic performance doesn't make a lot of sense, since we want to give an upper bound on how long the algorithms takes to terminate. However, if we found such an upper bound, it would be equivalent to solving the problem itself, because the runtime depends on how long a given sequence is, and putting it into a closed form. At that point we'd trivially have an O(1) algorithm.
What we can do, is measure complexity empirically and extrapolate something from there. I haven't done a lot of tests with this code but to me it seemed like it would scale the same as the original one but had a much lower constant.