I think a world of all-speech interfaces would be flawed (if that is the natural conclusion of this line of thinking). I may be able to speak 3x faster than I type, but I read 10x faster than I can listen to someone speak. Speech-to-text is good for typing but that is not to me analogous with the idea that it should supersede visuals.
Why would embracing one destroy the other? I think a combination of gestural interface, subvocalization voice recognition, and AR could be the big winner in our lifetimes. The text won't be bound to a screen, you don't give anything up, you just gain.
When you need to get into serious writing or bulk data entry, maybe it would be a keyboard.