The internet is an easy, convenient way to train LLMs, but I'm pretty sure you could train them with microphones. One cloud surveillance company, like maybe for networked security monitoring, or maybe just Alexa/Siri etc. could dip into as many and as varied communications per hour than all the books ever written.