Tokens saved should not be your north star metric. You should be able to show that tool call performance is maintained while consuming fewer tokens. I have no idea whether that is the case here.
As an aside: this is a cool idea but the prose in the readme and the above post seem to be fully generated, so who knows whether it is actually true.
Token counts alone tell you nothing about correctness, latency, or developer ergonomics. Run a deterministic test suite that exercises representative MCP calls against both native MCP and mcp2cli while recording token usage, wall time, error rate, and output fidelity.
Measure fidelity with exact diffs and embedding similarity, and include streaming behavior, schema-change resilience, and rate-limit fallbacks in the cases you care about. Check the repo for a runnable benchmark, archived fixtures captured with vcrpy or WireMock, and a clear test harness that reproduces the claimed 96 to 99 percent savings.
I can testify that Qwant, if nothing else, is a superior image search engine (basically does what GIS used to do 10-15 years ago) and it's better for just getting to a quick answer without your first 4 results being ad-driven.
Unfortunately, when needing to do deeper dives on things, Google is still more or less the best for results past the first page in my experience, though it's rare I need to dig that deep these days.
The author sets the solver to saga, doesn’t standardize the features, and uses a very high max_iter.
Logistic Regression takes longer to converge when features are not standardized.
Also, the zstd classifier time complexity scales linearly with the number of classes, logistic regression doesn’t. You have 20 (it’s in the name of the dataset), so why only use 4.
It’s a cool exploration of zstd. But please give the baseline some love. Not everything has to be better than something to be interesting.
You are correct. To be fair I wasn't focused on comparing the runtimes of both methods. I just wanted to give a baseline and show that the batch approach is more accurate.
IMO: trust-based systems only work if they carry risk. Your own score should be linked to the people you "vouch for" or "denounce".
This is similar to real life: if you vouch for someone (in business for example), and they scam them, your own reputation suffers. So vouching carries risk. Similarly, if you going around someone is unreliable, but people find out they actually aren't, your reputation also suffers. If vouching or denouncing become free, it will become too easy to weaponize.
Then again, if this is the case, why would you risk your own reputation to vouch for anyone anyway.
> Then again, if this is the case, why would you risk your own reputation to vouch for anyone anyway.
Good reason to be careful. Maybe there's a bit of an upside to: if you vouch for someone who does good work, then you get a little boost too. It's how personal relationships work anyway.
----------
I'm pretty skeptical of all things cryptocurrency, but I've wondered if something like this would be an actually good use case of blockchain tech…
> I'm pretty skeptical of all things cryptocurrency, but I've wondered if something like this would be an actually good use case of blockchain tech…
So the really funny thing here is the first bitcoin exchange had a Web of Trust system, and while it had it's flaws IT WORKED PRETTY WELL. It used GPG and later on bitcoin signatures. Nobody talks about it unless they were there but the system is still online. Keep in mind, this was used before centralized exchanges and regulation. It did not use a blockchain to store ratings.
As a new trader, you basically could not do trades in their OTC channel without going through traders that specialized in new people coming in. Sock accounts could rate each other, but when you checked to see if one of those scammers were trustworthy, they would have no level-2 trust since none of the regular traders had positive ratings of them.
If we want to make it extremely complex, wasteful, and unusable for 99% of people, then sure, put it on the blockchain. Then we can write tooling and agents in Rust with sandboxes created via Nix to have LLMs maintain the web of trust by writing Haskell and OCaml.
A 100% useful heuristic for "is blockchain useful here" is to understand that blockchains can be completely replaced, at much lower cost, with a database hosted by a trusted party.
If there is literally anyone that can be (or at least must be) trusted by all potential users of a system, then it's better to just use a database controlled by that person/entity. That's why blockchain-based solutions never pan out when it comes to interacting with the real world: In real life, there is a ton of trust required to do anything.
I'm unconvinced, to my possibly-undercaffeinated mind, the string of 3 posts reads like this:
- a problem already solved in TFA (you vouching for someone eventually denounced doesn't prevent you from being denounced, you can totally do it)
- a per-repo, or worse, global, blockchain to solve incrementing and decrementing integers (vouch vs. denounce)
- a lack of understanding that automated global scoring systems are an abuse vector and something people will avoid. (c.f. Black Mirror and social credit scores in China)
I don't think that trust is easily transferable between projects, and tracking "karma" or "reputation" as a simple number in this file would be technically easy. But how much should the "karma" value change form different actions? It's really hard to formalize efficiently. The web of trust, with all intricacies, in small communities fits well into participants' heads. This tool is definitely for reasonably small "core" communities handling a larger stream of drive-by / infrequent contributors.
> I don't think that trust is easily transferable between projects
Not easily, but I could imagine a project deciding to trust (to some degree) people vouched for by another project whose judgement they trust. Or, conversely, denouncing those endorsed by a project whose judgement they don't trust.
In general, it seems like a web of trust could cross projects in various ways.
Ethos is already building something similar, but starting with a focus on reputation within the crypto ecosystem (which I think most can agree is an understandable place to begin)
That is an easy way to game the whole system. Create a bunch of accounts and repos, cross vouch across all of them, generate a bunch of fake AI PRs and approve them all because none of the repos are real anyway. Then all you need is to find a way to connect your web of trust to a wider web of trust and you have a whole army of vouched sock puppet accounts.
Think Epstein but in code. Everyone would vouch for him as he’s hyper connected. So he’d get a free pass all the way. Until all blows in our faces and all that vouched for him now gets flagged. The main issue is that can take 10-20 years for it to blow up.
Then you have introverts that can be good but have no connections and won’t be able to get in.
So you’re kind of selecting for connected and good people.
Excellent point. Currently HN accounts get much higher scores if they contribute content, than if they make valuable comments. Those should be two separate scores. Instead, accounts with really good advice have lower scores than accounts that have just automated re-posting of content from elsewhere to HN.
Fair (and you’re basically describing the xz hack; vouching is done for online identities and not the people behind them).
Even with that risk I think a reputation based WoT is preferable to most alternatives. Put another way: in the current Wild West, there’s no way to identify, or track, or impose opportunity costs on transacting with (committing or using commits by) “Epstein but in code”.
But the blowback is still there. The Epstein saga has and will continue to fragment and discipline the elite. Most people probably do genuinely regret associating with him. Noam Chomsky's credibility and legacy is permanently marred, for example.
> trust-based systems only work if they carry risk. Your own score should be linked to the people you "vouch for" or "denounce"
This is a graph search. If the person you’re evaluating vouches for people those you vouch for denounce, then even if they aren’t denounced per se, you have gained information about how trustworthy you would find that person. (Same in reverse. If they vouch for people who your vouchers vouch for, that indirectly suggests trust even if they aren’t directly vouched for.)
I've been thinking in a similar space lately, about how a "parallel web" could look like.
One of my (admittedly half baked) ideas was a vouching similar with real world or physical incentives. Basically signing up requires someone vouching, similar to this one where there is actual physical interaction between the two. But I want to take it even further -- when you signup your real life details are "escrowed" in the system (somehow), and when you do something bad enough for a permaban+, you will get doxxed.
reply