Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

OK, AI playing video games is cool. But you know what's really really cool? It looks like SIMA 2 is controlling the mouse and reading the screen at something approaching 30+fps. WANT. Computer use agents are so slow right now, this is really something. I wonder what the architecture is for this.




Its even cooler if humans find something to be excited about in this world, since AI is replacing everything we do.

You do realise the things that AI are replacing are very new things humans have only been doing for the last ~50 years?

Humans have been writing way longer than 50 years. Same with painting, taking photos, filming, etc.

You do realize those very new things have mostly replaced what humans used to do ~50 years ago?

I desperately want an AI agent that can use my phone for me. Just something that takes instructs for each screen and execute it.

"Open Chrome"

"Go to xyz.com"

"open hamburger menu"

"Click login"

etc. etc.


Isn't that what the voice a11y tools have been doing for years. Why do you need AI for that.

https://support.google.com/accessibility/android/answer/6151...

https://support.apple.com/en-us/111778


My friend that is AI. However, it can get a lot better: be more aware of screen content, follow multiple instructions at once, keep context in mind throughout the conversation and from past interactions

AI commonly means LLM. Where are you determining this is using a LLM for proccessing?

AI has existed for several decades before the first LLM was ever created.[1][2][3]

And that's not even considering machine learning and deep learning which also have existed for many years before LLMs.

Even if you consider the current usage of the word AI in popular culture, it includes things that are not an LLM like Stable Diffusion and Suno

[1] https://en.wikipedia.org/wiki/Expert_system

[2] https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)

[3] https://en.wikipedia.org/wiki/Lisp_machine#Historical_contex...


You must be new around here, kid. AI has been there since the birth of the programmable computer.

That’s all programmatic, you can’t throw curve balls at it.

I have this in my bookmarks:

https://dafdef.com/aikey


Droidrun did a Show HN recently. It's exactly that.

When, manus.ai came out I wondered the same, the "use computer" mode seemed really interesting to me, although I've seen JAYU [1] which implemented computer use with gemini. Moreover I saw somewhere I really don't remember navigating web browser through layouts akin vimium like experience.

Initially (my impression of) computer use was only opening chrome and doing things inside chrome, chrome-as-os experience, I think maybe cloudflare could do something better here, with their workers?

Per user instance seems really costly I really do wonder how did they architect it.

[1] https://ai.google.dev/competition/projects/jayu


Is it though? A machine could play a game frame-by-frame.

Python dxcam + windows hook API listening for HID messages

Controlling the mouse?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: