Did the "calculator" layout yield much worse results than the winning inverted layout? If so, I'm really curious about the reasons for the difference in performance.
Should actual calculators adopt the phone layout, too? I've always been curious why not all devices use the same order (PIN entry pads, phones, calculators, etc.)
While not a complete answer to your question, American Scientist's much more detailed review of phone keypad design history says the final phone layout produced "significantly shorter keying times" than other layouts (with the exception of a layout with two five button horizontal rows that had to be rejected for other reasons) [1]. It also reports the differences between the top five contenders were statistically indistinguishable (but it doesn't record which designs were in those 5) and that the designers were aware of the calculator layouts (which were actually a lot less consistent at that time that most people assume today), so if the calculator layout had been in the top 5 it's likely they would have gone with it.
None of the calculator companies had the budgets or human factors research experience to do the kind of analysis that Bell Labs did (including some of the earliest true NUI UX research), so they just picked whatever design made the most sense to their engineers [2].
Should actual calculators adopt the phone layout, too? I've always been curious why not all devices use the same order (PIN entry pads, phones, calculators, etc.)