Hacker Newsnew | past | comments | ask | show | jobs | submit | wild_egg's commentslogin

Not to pull a "why should I use Dropbox when I have rsync" but why should we use this over adding a Playwright MCP to Claude Desktop or similar?

Does having access to Chromium internals give you any super powers over connecting over the Chrome Devtools Protocol?


Yes, eventually we think there is more value of owning the entire stack than just be a MCP connector.

Few ideas we were thinking of: integrating a small LLM, building MCP store into browser, building a more AI friendly DOM, etc.

Even today, we use chrome's accessibility tree (a better representation of DOM for LLMs) which is not exposed via chrome extension APIs.


> building a more AI friendly DOM

You might consider the Accessibility Tree and its semantics. Plain divs are basically filtered out so you're left with interactive objects and some structural/layout cues.


I've been trying (albeit not very hard) to build an accessibility library and toolset that can be exposed via mcp server. I think it has the potential to be much more ergonomic for generalized computer-use agents than stuff like playwright or the classic screenshot approach. Low latency computer use is another thing that I'd like to solve.

The issue is mac and windows accessibility APIs are opaque and I have no idea what I'm doing so I'm forced to vibe code it all which is not turning out too well... :-)

I suffer from mild carpal tunnel so I want to build a really low latency computer use agent that can do anything on my computer without me having to learn the talon voice syntax or some other traditional accessibility software like mac dictation.


Neat, is it on github?

Not yet, I've gone through a few prototypes that haven't really worked. Nothing has stuck enough to really get far enough for a repo.

I will try to publish something on gh this weekend.


> Few ideas we were thinking of: integrating a small LLM

Chrome has a built-in LLM: https://developer.chrome.com/docs/ai/built-in


I would take the position of "why use this when I have eyes and hands and a brain?"

Why use any tool when you have bare hands bla bla...

A good place to start is think about for example if you need to copy paste info from 100 websites to put into a spread sheet for example.


Why should I use a calculator when I can use an abacus?

Why use an abacus when I can just use my fingers and toes?

My guess is that this is for impatient people; people who think that the prescribed use cases are somehow necessary for their "workflows"; people who subscribe to terms like "cognitive friction" within the context of these use cases; people who are...sort of lazy.

...Why do these lazy people put so much effort into coming up with fancy words to justify that laziness?

That's a really good question. Maybe it's because laziness is associated with a lack of intellect? And certain technologies, like AI and other software, are meant to augment our intellect.

These fancy words carry an intellectual/productive effect. When they're put to use it probably makes people feel like they're getting things done. And they never feel lazy because of this.


A uniform language and ecosystem has been the siren song of JS for over a decade and I've yet to see it work out in any meaningful way.

Use whatever you like.


I mean, what else do you use to run things in the browser?

Pouchdb. Hypercore (pear). It’s nice to be able to spin up JS versions of things and have them “just work” in the most widely deployed platform in the world.

TensorflowJS was awesome for years, with things like blazeface, readyplayer me avatars and hallway tile and other models working in realtime at the edge. Before chatgpt was even conceived. What’s your solution, transpile Go into wasm?

Agents can work in people’s browsers as well as node.js around the world. Being inside a browser gives a great sandbox, and it’s private on the person’s own machine too.

This was possible years ago: https://www.youtube.com/watch?v=CpSzT_c7_UI&t=10m30s


> what else do you use to run things in the browser?

I do my best to run as little in the browser as possible. Everything is an order of magnitude simpler and faster to build if you do the bulk of things on a server in a language of your choice and render to the browser as necessary.


Being able to block sites completely and uprank others I find valuable is the entire reason I pay for Kagi.

The results aren't necessarily better than other search engines in general but the personalization is so incredibly valuable.

Oh and having it auto rewrite Reddit results to Old Reddit helps a lot too.


For those of us forced to be in the JS ecosystem, finally having a runtime that Just Works has been great.

Bun has replaced a massive number of tools and dependencies from our stack and really counteracted the tooling explosion that we were forced into with node.


> really counteracted the tooling explosion that we were forced into with node

Isn't this more-or-less a self-inflected wound? Who forced you into working with node?


In our case, it's not so much being forced to use Bun, but rather that Bun is in real terms infinitely more convenient than lower-level languages. Firstly, even the most novice of novices tend to have a passing familiarity with JS/TS, whereas this is not true for C/Zig/Rust/etc, so it's easier for people to contribute to our projects. Bun also provides so many things for free, statically, and cross platform. You want a TCP server? A websocket server? SQLite database? You want to include static assets? You want to generate static assets at compile time? Etc? Bun provides it.

Attempting to replicate even a modicum of this in lower-level languages can be a real struggle. Rust is definitively the least-worst in this respect because there's been a concerted effort by the community to provide stable packages that do most things. But Rust is a complicated and unapproachable language. Using other low-level languages like C/Zig, and you immediately run into issues of libraries and static linking. And even if you find a library, its documentation is either lacklustre or outright missing (looking at you libuv and libxev respectively).

The amount of manual setup and third-party builds-system finagling just to: 1) run a TCP server; 2) fetch data over HTTP; 3) do both of these using a single event loop (no separate threads); 4) use SQLite for storage; and 5) have all this produce a single self-contained executable. Yet I cannot understate how trivial this is with Bun.


$JOB


Could you please elaborate on this? What tools besides Node itself did you replace with Bun?


Imagine complaining that a language has so many users and projects so it sucks.

Exactly, bun is killer. The test runner is extremely fast.

I can build apps with 1-5 total dependencies and everything just works, and works incredibly fast.


What does SOTA tool use with voice agents look like these days? Any providers have MCP support?

It'd be incredible to get to a point where we can have natural conversations and the AI is running tools in the background and keeping tabs on things.


You can't have REST without it


> you still have to figure out how to concretely receive a response back

Isn't that handled by whatever Tool API you're using? There's usually a `function_call_output` or `tool_result` message type. I haven't had a need for a separate protocol just to send responses.


Can you elaborate on "bad token scoping"?

I don't think your XY phrasing fully describes the GitHub MCP exploit and curious if you think that's somehow a "token scoping" issue.


I'm unaware of the GitHub MCP "exploit", but given the overall state of LLM/MCP security FUD, there's probably some self promotion blog post from a security company about an LLM doing something stupid with GitHub data that the owner of the LLM using system didn't intend.

For example, let's say I create an application that lets you chat with my open source repo. I set up my LLM with a GitHub tool. I don't want to think about oauth and getting a token from the end user, so I give it a PAT that I generated from my account. I'm even more lazy so I just used a PAT I already had laying around, and it unfortunately had read/write access to SSH keys. The user can add their ssh key to my account and do malicious things.

Oh no, MCP is super vulnerable, please buy my LLM security product.

If you give the LLM a tool, and you give the LLM input from a user, the user has access to that tool. That shrimple.


https://news.ycombinator.com/item?id=44097390

Also currently on the front page. It's mainly that this tool hits the trifecta of having privileged access, untrusted inputs, and ability to exfiltrate. Most tools only do 1-2 of those so attacks need to be more sophisticated to coordinate that.


I haven't looked at MCP payloads properly to compare but often the raw OpenAPI spec is overly verbose and eats context space pretty quick.

Really trivial to have the LLM first filter it down to the sections it cares about and then condense those sections though.

Wrap that process in a small tool and give that to the LLM along with a `fetch` tool that handles credentials based on URLs and agent capabilities explode pretty rapidly.


> You can think of algebraic effects essentially as exceptions that you can resume.

So conditions in Common Lisp? I do love the endless cycle of renaming old ideas


No, algebraic effects are a generalization that support more cases than LISP's condition system since continuations are multi-shot. The closest thing is `call/cc` from Scheme.

Sometimes making these parallelism hurts more than not having them in the first place


Ah multi-shot does make a big difference, thanks for clarifying!


Also literal "resumable exceptions" in Smalltalk.


What a thought-terminating way to approach an idea. Effects are not simply renamed conditions, and we have a whole article here describing them in more detail than that one sentence, so you can see some of the differences for yourself.


Also dependency injection.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: