Hacker News new | past | comments | ask | show | jobs | submit | hackgician's comments login

Octomind is sick, web agents are such an interesting space; would love to talk to you more about challenges you might've faced in building it


Sorry didnt see this earlier. If you're interested reach out to me (Kosta Welke) on linkedin. Or write me an email, you can find me on Octominds About page.


Yes and no. Getting a VLM to work on the web would definitely be great, but it comes with its own problems, mainly around developing and acting on bounding boxes. We have vision as a default fallback for Stagehand, but we've found that the screenshot sent to the VLM often has to have pre-labeled elements on it. More notably, the screenshot with everything prelabeled leads to a cluttered and unusable image to process. Not pre-labeling runs the risk of missing important elements. I imagine a happy medium where the DOM+a11y tree can be used for candidate generation to a VLM.

Solely depending on a VLM is indeed reminiscent of how humans interact with the web, but when a model thrives with more data, why restrict the data sent to the model?


Thanks so much! Yes, a lot of antibots are able to detect Playwright based on browser config. Generally, antibots are a good thing -- I think in the future, as web agents become more popular, I'd imagine a fruitful partnership to prevent misuse if it's coming from a trusted web agent v. an unknown one


This is super interesting, is it open source? Would love to talk to you more about how this worked


Its not at a stage I'd be comfortable to put it on GitHub yet, maybe in a few months.

And I think you misunderstood my comment, I didn't describe my project, but extrapolated from the parents desire and my motivations for my project.

Mine is actually pretty close to stagehand, at least I could very well use it. It's basically a web UI to configure browser tasks like open webpage x, iterate over "item type", with LLM integration to determine what the CSS selector for that would be. On next execution it would attempt to use the previously determined CSS selector instead of the LLM integration. On failures, it'd raise a notification with an admin tasks to verify new selectors/fix the script

But it's a lot of code to put together as a generic UI - as I want these tasks to be repeatable without restarting from the beginning etc

Still very much in the PoC stage without any tests, barely working persistence etc


Big fan of Hack Club and everything you guys are doing! Such a phenomenal initiative


This is sick! Starred, thanks for sharing :)


Yes^ this is what we suggest. Stagehand is meant to execute isolated tasks on browsers; we support using custom contexts (cookies) with the following command:

    npx create-browser-app --example persist-context


Yes! These are both phenomenal projects, and kudos to their authors as well. Stagehand is different in that it makes fine-grained control a first-class citizen. Often times, you want to control the exact steps a web agent takes. Our experience using other tools was that the only control you have over these steps in other tools is in the natural language prompt.

However with Stagehand, because it's an extension of Playwright, it allows you to confirm each step of the underlying agent's workflow, making it the most customizable option for engineers who want/need that


Thanks so much! Crawlspace is pretty sick too, as is Integuru. A lot of people have different takes here on the level of automation to leave up to the user. As a developer building for developers, I wanted to meet in the middle and build off an existing incumbent that most people are likely familiar with already


Yea Integuru is pretty cool: https://github.com/Integuru-AI/Integuru


That's definitely compelling, but not something we have in mind for the immediate future. Let me know if you end up building something here!


I'm currently working on it :)

See you in two weeks I hope


anything we can see already?


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: