Hacker Newsnew | past | comments | ask | show | jobs | submit | MK_Dev's commentslogin

How do you turn off dark mode on that site? Hurts my eyes


Thanks for the feedback. We should definitely add that. :)


In Firefox, Reader View (F9) seems to handle it well.


This is a pretty cool idea and implementation. Any more details on the tech stack you guys are using (besides `browser-use`)?


Thank you! We have a fork of browser-use that lets us hand hold web agents since we know our tasks are repetitive. We can cache expected paths and fire alerts if we go off the rails. We'd love to contribute it back at some point, mainly a question of bandwidth.

We're evaluating Cua (https://www.ycombinator.com/companies/cua) to containerize our agents; am a fan so far. We're also putting Computer Use agents from (OAI and Anthropic) to the test. Many legacy ERPs don't run in the browser and we have to meet them there. I think we're a few months away from things working reliably and efficiently.

We're evaluating several of the top models (both open and closed) for browser navigation (claude's winning atm) and PDF extraction. Since we're performing repetitive tasks, the goal is make our workflows RL-able. Being able to rely on OSS models will help a lot here.

We're building our own data sets and evaluations for many of the subtasks. We're using openai's evals (https://github.com/openai/evals) as a framework to guide our own tooling.

Apart from that, we write in Typescript, Python, and Golang. We use Postgres for persistence (nothing fancy here). We host on AWS, and might go on premises for some customers. We plan on investing a lot into our workflow system as the backbone of our product.

I prefer open source when possible. Everything's new and early, and many things require source changes that others might not be able to prioritize.

Edit - one thing I'd love to find a good solution for is reliably extracting handwriting from PDF documents. Clinicians have to do this a ton to keep the trains running on time, and being able to digitize that knowledge on the go will be huge.

Very open to ideas here. We're seeing great tools and products come up by the day, including from our own YC batch.


what made you fork browser-use? what were the missing bits? your use case sounds similar to what they're trying with their new workflow-use repo (I am not affiliated with them, just curious)


It's a great repo! We had issues with iframes and framesets (which are old DOM tags) we had to write custom code for. Some DOMs need annotation to provide meaning to an LLM (for example, a button is clearly an "add demographics" button to the human eye, but is ambiguous in the DOM (ul contains li...). Some bottlenecks in navigation required manual attention. We keep those to a minimum. I think the future is being able to progress from highly deterministic JS code, to more agentic LLM-driven decisions. One does need to be able to control this for performance, cost, and accuracy. And yes we have some overlap with workflow-use's direction, but I hope that more such OSS methods gain popularity! It'd simply mean we can go after higher value and more complex clinical tasks!


Did you consider working around those using the vision models vs DOM parsing? Was cost/latency the bottleneck? Seems like the agentic future you describe would need more vision based parsing


I believe we will at some point. All question of the right need coming up. Text OCR has gotten really good, and if you think of it from a UI perspective, the only real contract is that a screen will show text that's representative of the information entered. The DOM is useful but is a changeable contract!


Would you be able to add a light mode to the site?


On a number of occasions that I was asked this question, I replied with a question of my own: "where _can_ I be in X years in this company?" Unfortunately, I never received a reply even remotely close to a satisfactory. Satisfactory answers would include opportunities and directions, with big bonus points for an attempt to find overlaps with my general desires, motivation, and strengths.


That's a great question to ask! And now you've got me thinking of my answer to that question for my direct reports.

I will say that there does have to be some "meet in the middle" with this entire exchange. If my directs aren't interested in 1:1s, the meeting won't be effective.


"the traits that made them so successful" - or perhaps the choices?

Too much confusion between fame and wealth - the author tried to disambiguate, but I don't think they succeeded.

"If you are "publicly rich" (think athletes, authors, movie stars, politicians, etc.), you no longer have a private life." - isn't this true for most public figures regardless of their wealth status?


I don't want to be rich and famous. I want to be rich and not famous.


What are your top frameworks that satisfy all of these (assuming developer productivity is a factor)?


The irony



I have an older unsupported device still connected to my car. Any open source ideas for it?


I was getting worried we haven't seen a new JS tool for a few minutes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: