OK, AI playing video games is cool. But you know what's really really cool? It looks like SIMA 2 is controlling the mouse and reading the screen at something approaching 30+fps. WANT. Computer use agents are so slow right now, this is really something. I wonder what the architecture is for this.
My friend that is AI. However, it can get a lot better: be more aware of screen content, follow multiple instructions at once, keep context in mind throughout the conversation and from past interactions
When, manus.ai came out I wondered the same, the "use computer" mode seemed really interesting to me, although I've seen JAYU [1] which implemented computer use with gemini.
Moreover I saw somewhere I really don't remember navigating web browser through layouts akin vimium like experience.
Initially (my impression of) computer use was only opening chrome and doing things inside chrome, chrome-as-os experience, I think maybe cloudflare could do something better here, with their workers?
Per user instance seems really costly I really do wonder how did they architect it.