More

hackgician · 2025-01-09T23:11:08 1736464268

Octomind is sick, web agents are such an interesting space; would love to talk to you more about challenges you might've faced in building it

Kostarrr · 2025-01-13T08:24:04 1736756644

Sorry didnt see this earlier. If you're interested reach out to me (Kosta Welke) on linkedin. Or write me an email, you can find me on Octominds About page.

hackgician · 2025-01-09T23:03:47 1736463827

Yes and no. Getting a VLM to work on the web would definitely be great, but it comes with its own problems, mainly around developing and acting on bounding boxes. We have vision as a default fallback for Stagehand, but we've found that the screenshot sent to the VLM often has to have pre-labeled elements on it. More notably, the screenshot with everything prelabeled leads to a cluttered and unusable image to process. Not pre-labeling runs the risk of missing important elements. I imagine a happy medium where the DOM+a11y tree can be used for candidate generation to a VLM.

Solely depending on a VLM is indeed reminiscent of how humans interact with the web, but when a model thrives with more data, why restrict the data sent to the model?

hackgician · 2025-01-09T23:00:08 1736463608

Thanks so much! Yes, a lot of antibots are able to detect Playwright based on browser config. Generally, antibots are a good thing -- I think in the future, as web agents become more popular, I'd imagine a fruitful partnership to prevent misuse if it's coming from a trusted web agent v. an unknown one

hackgician · 2025-01-09T22:57:47 1736463467

This is super interesting, is it open source? Would love to talk to you more about how this worked

ffsm8 · 2025-01-10T12:59:01 1736513941

Its not at a stage I'd be comfortable to put it on GitHub yet, maybe in a few months.

And I think you misunderstood my comment, I didn't describe my project, but extrapolated from the parents desire and my motivations for my project.

Mine is actually pretty close to stagehand, at least I could very well use it. It's basically a web UI to configure browser tasks like open webpage x, iterate over "item type", with LLM integration to determine what the CSS selector for that would be. On next execution it would attempt to use the previously determined CSS selector instead of the LLM integration. On failures, it'd raise a notification with an admin tasks to verify new selectors/fix the script

But it's a lot of code to put together as a generic UI - as I want these tasks to be repeatable without restarting from the beginning etc

Still very much in the PoC stage without any tests, barely working persistence etc

hackgician · 2025-01-09T22:35:19 1736462119

Big fan of Hack Club and everything you guys are doing! Such a phenomenal initiative

hackgician · 2025-01-09T22:30:26 1736461826

This is sick! Starred, thanks for sharing :)

hackgician · 2025-01-09T22:27:17 1736461637

Yes^ this is what we suggest. Stagehand is meant to execute isolated tasks on browsers; we support using custom contexts (cookies) with the following command:

    npx create-browser-app --example persist-context

hackgician · 2025-01-09T22:26:07 1736461567

Yes! These are both phenomenal projects, and kudos to their authors as well. Stagehand is different in that it makes fine-grained control a first-class citizen. Often times, you want to control the exact steps a web agent takes. Our experience using other tools was that the only control you have over these steps in other tools is in the natural language prompt.

However with Stagehand, because it's an extension of Playwright, it allows you to confirm each step of the underlying agent's workflow, making it the most customizable option for engineers who want/need that

hackgician · 2025-01-09T22:22:02 1736461322

Thanks so much! Crawlspace is pretty sick too, as is Integuru. A lot of people have different takes here on the level of automation to leave up to the user. As a developer building for developers, I wanted to meet in the middle and build off an existing incumbent that most people are likely familiar with already

insdev12 · 2025-01-09T23:08:10 1736464090

Yea Integuru is pretty cool: https://github.com/Integuru-AI/Integuru

hackgician · 2025-01-09T22:20:33 1736461233

That's definitely compelling, but not something we have in mind for the immediate future. Let me know if you end up building something here!

temuze · 2025-01-10T01:21:38 1736472098

I'm currently working on it :)

See you in two weeks I hope

owebmaster · 2025-01-10T19:22:06 1736536926

anything we can see already?