Hacker Newsnew | past | comments | ask | show | jobs | submit | stephantul's commentslogin

Same. Admitting to it is one thing, but still it takes a certain kind of attitude to outright forbid people to write tests.

Tokens saved should not be your north star metric. You should be able to show that tool call performance is maintained while consuming fewer tokens. I have no idea whether that is the case here.

As an aside: this is a cool idea but the prose in the readme and the above post seem to be fully generated, so who knows whether it is actually true.


Token counts alone tell you nothing about correctness, latency, or developer ergonomics. Run a deterministic test suite that exercises representative MCP calls against both native MCP and mcp2cli while recording token usage, wall time, error rate, and output fidelity.

Measure fidelity with exact diffs and embedding similarity, and include streaming behavior, schema-change resilience, and rate-limit fallbacks in the cases you care about. Check the repo for a runnable benchmark, archived fixtures captured with vcrpy or WireMock, and a clear test harness that reproduces the claimed 96 to 99 percent savings.


Are you an llm? That would be so ironic

I found this comment because I was wondering the same thing on a completely unrelated thread. I strongly suspect this is a bot.

You can post this under every of my comments, that does not make it true. I can go to your account and do the same on your comments.

ok, I'll stop. I am not the only person who suspected you!

I use LLMs to support in writing comments, like brainstorming and fixing grammar + spelling. But many people use that these days.

This is such a funny interaction

Happens all the time nowadays here on HN. IMHO, The llm accusations go out of hand

No, unless you ask danlitt who tries to suspect me of llm under every of my comments.

The AI prose is getting so tiring to read

"We measured this. Not estimates — actual token counts using the cl100k_base tokenizer against real schemas, verified by an automated test suite."


This is a thinly veiled commercial, not really useful.

Ecosia is not just Bing, we offer a bunch of indexes, including bing and Google.

We’re moving to our own index, which we are building in collaboration with Qwant, under the name European Search Perspective.

I do see the point of the article however.


I can testify that Qwant, if nothing else, is a superior image search engine (basically does what GIS used to do 10-15 years ago) and it's better for just getting to a quick answer without your first 4 results being ad-driven.

Unfortunately, when needing to do deeper dives on things, Google is still more or less the best for results past the first page in my experience, though it's rare I need to dig that deep these days.


Qwant is somewhat focused on France, so coverage really depends on whether that is the market you’re looking for.


I love Qwant and I think it works better than DuckDuckGo.

I can't wait for the European index.


Will I ever be able to pay for ecosia?


Genuinely curious: what would you like to pay for? Not seeing ads? Unlocking new features? Or just supporting us?


API


Georgi is such a legend. Glad to see this happening


Yes. This is known as a knn classifier. Knn classifiers are usually worse than other simple classifiers, but trivial to update and use.

See e.g., https://scikit-learn.org/stable/auto_examples/neighbors/plot...


The speed comparison is weird.

The author sets the solver to saga, doesn’t standardize the features, and uses a very high max_iter.

Logistic Regression takes longer to converge when features are not standardized.

Also, the zstd classifier time complexity scales linearly with the number of classes, logistic regression doesn’t. You have 20 (it’s in the name of the dataset), so why only use 4.

It’s a cool exploration of zstd. But please give the baseline some love. Not everything has to be better than something to be interesting.


You are correct. To be fair I wasn't focused on comparing the runtimes of both methods. I just wanted to give a baseline and show that the batch approach is more accurate.


Yeah sorry, reading it back I was a bit too harsh haha. It was my pre-coffee comment. Nice post!


What an interesting long-form portrait.

Looking at your phone while driving is extremely dangerous, please don’t do it.


IMO: trust-based systems only work if they carry risk. Your own score should be linked to the people you "vouch for" or "denounce".

This is similar to real life: if you vouch for someone (in business for example), and they scam them, your own reputation suffers. So vouching carries risk. Similarly, if you going around someone is unreliable, but people find out they actually aren't, your reputation also suffers. If vouching or denouncing become free, it will become too easy to weaponize.

Then again, if this is the case, why would you risk your own reputation to vouch for anyone anyway.


> Then again, if this is the case, why would you risk your own reputation to vouch for anyone anyway.

Good reason to be careful. Maybe there's a bit of an upside to: if you vouch for someone who does good work, then you get a little boost too. It's how personal relationships work anyway.

----------

I'm pretty skeptical of all things cryptocurrency, but I've wondered if something like this would be an actually good use case of blockchain tech…


> I'm pretty skeptical of all things cryptocurrency, but I've wondered if something like this would be an actually good use case of blockchain tech…

So the really funny thing here is the first bitcoin exchange had a Web of Trust system, and while it had it's flaws IT WORKED PRETTY WELL. It used GPG and later on bitcoin signatures. Nobody talks about it unless they were there but the system is still online. Keep in mind, this was used before centralized exchanges and regulation. It did not use a blockchain to store ratings.

As a new trader, you basically could not do trades in their OTC channel without going through traders that specialized in new people coming in. Sock accounts could rate each other, but when you checked to see if one of those scammers were trustworthy, they would have no level-2 trust since none of the regular traders had positive ratings of them.

Here's a link to the system: https://bitcoin-otc.com/trust.php (on IRC, you would use a bot called gribble to authenticate)


Biggest issue was always the fiat transfers.


If we want to make it extremely complex, wasteful, and unusable for 99% of people, then sure, put it on the blockchain. Then we can write tooling and agents in Rust with sandboxes created via Nix to have LLMs maintain the web of trust by writing Haskell and OCaml.


Well done, you managed to tie Rust, Nix, Haskell and OCaml to "extremely complex, wasteful, and unusable"


Boring Java dev here. Do I just sit this one out?


Zig can fix this, I'm sure.


zig can fix everything


A 100% useful heuristic for "is blockchain useful here" is to understand that blockchains can be completely replaced, at much lower cost, with a database hosted by a trusted party.

If there is literally anyone that can be (or at least must be) trusted by all potential users of a system, then it's better to just use a database controlled by that person/entity. That's why blockchain-based solutions never pan out when it comes to interacting with the real world: In real life, there is a ton of trust required to do anything.


I'm unconvinced, to my possibly-undercaffeinated mind, the string of 3 posts reads like this:

- a problem already solved in TFA (you vouching for someone eventually denounced doesn't prevent you from being denounced, you can totally do it)

- a per-repo, or worse, global, blockchain to solve incrementing and decrementing integers (vouch vs. denounce)

- a lack of understanding that automated global scoring systems are an abuse vector and something people will avoid. (c.f. Black Mirror and social credit scores in China)


Those are good arguments against. I want to make it clear that I think it’s a possibly interesting idea, but also probably a bad one too! :)


I don't think that trust is easily transferable between projects, and tracking "karma" or "reputation" as a simple number in this file would be technically easy. But how much should the "karma" value change form different actions? It's really hard to formalize efficiently. The web of trust, with all intricacies, in small communities fits well into participants' heads. This tool is definitely for reasonably small "core" communities handling a larger stream of drive-by / infrequent contributors.


> I don't think that trust is easily transferable between projects

Not easily, but I could imagine a project deciding to trust (to some degree) people vouched for by another project whose judgement they trust. Or, conversely, denouncing those endorsed by a project whose judgement they don't trust.

In general, it seems like a web of trust could cross projects in various ways.


Ethos is already building something similar, but starting with a focus on reputation within the crypto ecosystem (which I think most can agree is an understandable place to begin)

https://www.ethos.network/


I'm confused. Why do I need "reputation within the crypto ecosystem"? If I want to trade it, I use an exchange, like Binance.


Both sides of the equation can be gamed. This has always been the issue with reputation systems.


Sounds like a black mirror episode.


isnt that like literally the plot in one of the episodes? where they get a x out of 5 rating that is always visble.


Yes, there is one that is pretty close to this scenario.


Look at ERC-8004


> Then again, if this is the case, why would you risk your own reputation to vouch for anyone anyway.

The same as when you vouch for your company to hire someone - because you will benefit from their help.

I think your suggestion is a good one.


> Then again, if this is the case, why would you risk your own reputation to vouch for anyone anyway.

Maybe your own vouch score goes up when someone you vouched for contributes to a project?


That is an easy way to game the whole system. Create a bunch of accounts and repos, cross vouch across all of them, generate a bunch of fake AI PRs and approve them all because none of the repos are real anyway. Then all you need is to find a way to connect your web of trust to a wider web of trust and you have a whole army of vouched sock puppet accounts.


Think Epstein but in code. Everyone would vouch for him as he’s hyper connected. So he’d get a free pass all the way. Until all blows in our faces and all that vouched for him now gets flagged. The main issue is that can take 10-20 years for it to blow up.

Then you have introverts that can be good but have no connections and won’t be able to get in.

So you’re kind of selecting for connected and good people.


Excellent point. Currently HN accounts get much higher scores if they contribute content, than if they make valuable comments. Those should be two separate scores. Instead, accounts with really good advice have lower scores than accounts that have just automated re-posting of content from elsewhere to HN.


Fair (and you’re basically describing the xz hack; vouching is done for online identities and not the people behind them).

Even with that risk I think a reputation based WoT is preferable to most alternatives. Put another way: in the current Wild West, there’s no way to identify, or track, or impose opportunity costs on transacting with (committing or using commits by) “Epstein but in code”.


But the blowback is still there. The Epstein saga has and will continue to fragment and discipline the elite. Most people probably do genuinely regret associating with him. Noam Chomsky's credibility and legacy is permanently marred, for example.


> trust-based systems only work if they carry risk. Your own score should be linked to the people you "vouch for" or "denounce"

This is a graph search. If the person you’re evaluating vouches for people those you vouch for denounce, then even if they aren’t denounced per se, you have gained information about how trustworthy you would find that person. (Same in reverse. If they vouch for people who your vouchers vouch for, that indirectly suggests trust even if they aren’t directly vouched for.)


I've been thinking in a similar space lately, about how a "parallel web" could look like.

One of my (admittedly half baked) ideas was a vouching similar with real world or physical incentives. Basically signing up requires someone vouching, similar to this one where there is actual physical interaction between the two. But I want to take it even further -- when you signup your real life details are "escrowed" in the system (somehow), and when you do something bad enough for a permaban+, you will get doxxed.


What a good post! I loved the takeaways at the end of each section.

I think it would maybe get more traction if the code was in pytorch or JAX. It’s been a long while since I’ve seen people use Keras.


You are absolutely right about the code: I haven't worked with neural networks in a while and I guess my post outs me!

That said, I do like Keras's functional API, and in this case I think it maps nicely to the "math" of the hypernetwork.

I really appreciate your suggestion of more popular libraries, and I'll look into JAX.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: