More

extr · 2025-08-08T05:04:50 1754629490

O3 is fantastic at coding tasks, until today it was smartest model in existence. But it works only in few shot conversational scenarios, it's not good at agentic harnesses.

extr · 2025-08-08T01:45:51 1754617551

Why not? Have you ever actually used these things? The risk is incredibly low. I run claude code with zero permissions every day for hours. Never a problem.

byronic · 2025-08-08T02:10:02 1754619002

I have (not an exhaustive list) SSH keys and sensitive repositories hanging out on my filesystem. I don't trust _myself_ with that, let alone an LLM, unless I'm running ollama or similar local nonsense with no net connectivity.

I'm a few degrees removed from an air gapped environment so obviously YMMV. Frankly I find the idea of an LLM writing files or being allowed to access databases or similar cases directly distasteful; I have to review the output anyway and I'll decide what goes to the relevant disk locations / gets run.

Touche · 2025-08-08T02:31:42 1754620302

They don't have arbitrary access over your file system. They ask permission for doing most everything. Even reading files, they can't do that outside of the current working directory without permission.

globular-toast · 2025-08-08T09:38:04 1754645884

Comments like this just show how bad the average dev is at security. Ever heard of the principle of least privilege? It's crazy that anyone who has written at least one piece of software would think "nah, it's fine because the software is meant to ask before doing".

mark_undoio · 2025-08-08T08:16:14 1754640974

I'm pretty comfortable with the agent scaffolding just restricting directory access but I can see places it might not be enough...

If you were being really paranoid then I guess they could write a script in the local directory that then runs and accesses other parts of the filesystem.

I've not seen any evidence an agent would just do that randomly (though I suppose they are nondeterministic). In principle maybe a malicious or unlucky prompt found somewhere in the permitted directory could trigger it?

swader999 · 2025-08-08T02:33:50 1754620430

Your obviously skilled, spending the money on a Claude only machine would pay for itself in less than three weeks. If I was your employer, it would be a no brainer.

byronic · 2025-08-08T03:17:43 1754623063

Make me that offer :D

swader999 · 2025-08-08T13:18:15 1754659095

You don't want to work in a billing system.

extr · 2025-08-06T19:18:35 1754507915

Why is that laughable? Rum isn't a health drink, but if you were looking for information to support the case that it has some health benefits (which is literally the search term)...seems like a reasonable answer. What did you expect? A moralistic essay on how alcohol is bad?

apwell23 · 2025-08-06T19:40:34 1754509234

there is no antioxidant health benefits from rum. how is making stuff up reasonable.

nsonha · 2025-08-06T20:24:38 1754511878

people make stuff up and post online. You will find made up shit with or without AI with that kind of query. So yes, it's reasonable that AI exposes you to the real Internet, and it's doing, at worst, as good a job as search engines.

apwell23 · 2025-08-06T20:42:00 1754512920

> reasonable that AI exposes you to the real Internet,

Any response will be 'reasonable' by that standard.

what · 2025-08-07T05:11:46 1754543506

The AI overview is just summarizing the top results. You’d find the exact same information if it wasn’t there and just clicked the search results.

apwell23 · 2025-08-07T10:32:35 1754562755

> The AI overview is just summarizing the top results.

Nope

nsonha · 2025-08-07T13:40:37 1754574037

nope what, here's a tip: next time add "truth only" in your query, it works!

apwell23 · 2025-08-07T18:58:45 1754593125

you don't know anything

nsonha · 2025-08-10T13:19:08 1754831948

Have you tried? It works!

pessimizer · 2025-08-06T19:31:26 1754508686

A lot of people are desperate for AI to lecture to them from a position of authority, consider it broken when it doesn't, and start praying to it when it does.

edit: AI doesn't even have a corrupting, disgusting physical body, of course it should be recommending clean diets and clean spirits!

extr · 2025-08-06T19:14:40 1754507680

I can believe this. A lot of my google search usage now is something like:

> "what is the type of wrench called for getting up into tight spaces"

> AI search gives me an overview of wrench types (I was looking for "basin wrench")

> new search "basin wrench amazon"

> new search "basin wrench lowes"

> maps.google.com "lowes"

Notably, the information I was looking for was general knowledge. The only people "losing out" here are people running SEO-spammish websites that themselves (at this point) are basically hosting LLM-generated answers for me to find. These websites don't really need to exist now. I'm happy to funnel 100% of my traffic to websites that are representing real companies offering real services/info (ship me a wrench, sell me a wrench, show me a video on how to use the wrench, etc).

thewebguyd · 2025-08-06T19:19:43 1754507983

> The only people "losing out" here are people running SEO-spammish websites that themselves (at this point) are basically hosting webpages containing LLM-generated answers for me to find.

Agreed. The web will be better off for everyone if these sites die out. Google is what brought these into existence in the first place, so I find it funny Google is now going to be one of the ones helping to kill them. Almost like they accidentally realized SEO got out of control so they have to fix their mistake.

extr · 2025-08-06T19:29:08 1754508548

At one point these SEO pages were in fact providing a real service, and you could view them as a sort of "manual", prototypical, distributed form of AI. Millions of people trying to understand what information was valuable and host webpages to satisfy the demand for that info, and get rewarded for doing so. It obviously went too far, but at one point, it did make sense to allow these websites to proliferate. I know without AI, I probably just would have clicked on the first link that said "types of wrenches" and read a little bit. I probably would have gotten my answer, it just wouldn't have been quite as laser-targeted to my exact question.

thewebguyd · 2025-08-06T20:43:18 1754512998

True, the early days these sites were genuinely helpful. The monetization model was a little different though which is what I think kept them useful. You'd use the content just to drive traffic, which would result in ad clicks on your banner ads, etc.

Then "content marketing" took over, and the content itself was now also used to sell a product or service, sort of an early form of influencer marketing and that is when I think it all started to go down hill. We stopped seeing the more in depth content which actually taught something, and more surface level keywords that were just used to drive you to their product/service.

OTOH, the early web was also full of niche forums, most viewable without an account and indexable, of about any topic you could imagine where you could interact with knowledgeable folks in that niche. Google would have been more helpful to users by surfacing more of those forums vs. the blogs.

Those forums are IMO the real loss here. Communities have moved into discord, or another closed platform that doesn't appear on the web, and many that require accounts or even invitations to just view read only.

01HNNWZ0MV43FF · 2025-08-07T08:16:10 1754554570

"Live by the SEORD, die by the SEORD"

Or was it, "Live by the AdWord..."

nicbou · 2025-08-06T20:41:39 1754512899

Hard disagree. I put a great deal of work into my website, putting hard-earned information on the internet for the first time. Now Google reaps all the value I create without as much as a "thank you".

awongh · 2025-08-06T20:48:34 1754513314

Unfortunately the new victims in the system of LLMs and the way they distribute knowledge is the specialized content website.

Now an LLM just knows all the content you painstakingly gathered on your site. (It could also be, and is likely that it was also collected from other hard to find sites across the internet).

The original web killed off the value of a certain kind of knowledge (encyclopedias, etc.) and LLMs will do the same.

There are plenty of places to place the blame, but this is a function of any LLM, and a funcamental way LLMs work, not just a problem created and profited from by Google- for example the open-weight models, where no-one is actually profiting directly.

jjani · 2025-08-07T02:43:24 1754534604

> There are plenty of places to place the blame, but this is a function of any LLM, and a funcamental way LLMs work, not just a problem created and profited from by Google- for example the open-weight models, where no-one is actually profiting directly.

First time learning that scraping and training on data that they have often been explicitly disallowed to obtain for free or for that purpose by the rights holders is "fundamental to how LLMs work". If not, then there is no reason those who gathered the information wouldn't stand to profit by selling LLMs this data.

nothercastle · 2025-08-07T02:50:30 1754535030

We killed forums and now were killing specialty websites. Soon we will get right back to pre internet levels of knowledge. Ai will be able to find all the stuff that was easily available in encyclopedias and text books and none of the stuff that made 90s-2000 internet great.

ryandrake · 2025-08-07T04:02:40 1754539360

The Internet had plenty of specialty knowledge before "Gathering that knowledge, SEOing it, and stuffing ads all over it" became a viable business model, and it will have plenty of specialty knowledge after this era is over.

nicbou · 2025-08-07T06:44:27 1754549067

Specialty knowledge can also be a guy reviewing air purifiers with obsessive care. It can be someone writing detailed travel guides or in my case an essential resource for immigrants. There were a lot of good websites, and if you could not find them, blame Google for letting its search engine go to the dogs.

Gathering that knowledge is work, and if anyone should capture that value, it's the people doing the work. Seeing bug tech slurp it all up, insert inself in the middle and capture all the value is heartbreaking.

I hate that AI is destroying the economics of the independent web, and people cheer for it because they landed on Forbes one too many times. It's insulting to all the people who did their job right, and still get their work slurped up by an uncaring corporation with no way to stop them.

We will get the web that we deserve.

tayo42 · 2025-08-07T04:14:11 1754540051

That was already dying if not dead thanks to YouTube and short form video being the preferred way for people to make content and I guess consume

bluefirebrand · 2025-08-07T03:42:13 1754538133

I wonder how long it will take AI agents to automatically make accounts and submit credit card info to scrape data that is paywalled

I don't think there is any reason they couldn't do that

integralid · 2025-08-07T09:13:26 1754558006

>The original web killed off the value of a certain kind of knowledge (encyclopedias, etc.) and LLMs will do the same.

I wonder how many people will decide to just stop sharing technical knowledge because of that, and how much we will lose because of it.

rkomorn · 2025-08-07T09:21:51 1754558511

Maybe (I can't emphasize enough that I say this with a high degree of skepticism) they'll come up with a way of doing cited/credited/paid work where the LLM will include who "taught" it something.

Eg: instead of writing a blog post, you'll submit a knowledge article to an AI provider and that'll go into the AI's training set and it'll know "you" told it. And maybe (even more skeptical) pay you for it.

Again: highest degree of skepticism, but at the same time, that's the only way I could see people continuing to write content that teaches anything.

nicbou · 2025-08-07T10:17:05 1754561825

This will only happen as an absolute necessity, for example if the EU threatens massive fines or the US antitrust people grow some teeth.

Even then, you'll get malicious compliance. The best case scenario would be a bit like Spotify: everyone getting fractions of a penny.

rkomorn · 2025-08-07T11:08:21 1754564901

Agreed. I'm particularly pessimistic about how this will play out.

troyvit · 2025-08-07T14:32:17 1754577137

> There are plenty of places to place the blame, but this is a function of any LLM, and a funcamental way LLMs work, not just a problem created and profited from by Google- for example the open-weight models, where no-one is actually profiting directly.

One problem with monopolies is their massive multiplicative effects on otherwise manageable problems.

fennecfoxy · 2025-08-07T14:46:06 1754577966

Unfortunately, existence is a thankless job. Just gotta enjoy the ride.

SoftTalker · 2025-08-07T03:19:12 1754536752

You put your information on a public website for free, and are surprised that people use it as they see fit?

nicbou · 2025-08-07T06:46:40 1754549200

If I gave my surplus eggs for free, I'd be annoyed if they ended up at the Best Western's buffet. These eggs were meant for my neighbours.

There is a difference between working for free for your community and working for free for a trillion dollar company's investors. Doubly so when you strip consent or attribution from the equation.

SoftTalker · 2025-08-07T16:14:24 1754583264

Then you should make sure you are giving the eggs to your neighbors and not to Best Western.

nicbou · 2025-08-07T19:32:08 1754595128

I can't help if the Best Western plunders the honesty box in front of my house. There is no way to stop them.

I hope that you understand that this is a metaphor for free labour intended for a community being exploited by big corporations with no way of stopping them.

kcartlidge · 2025-08-07T10:09:56 1754561396

Just because something is on a public website accessible for free doesn't mean it is then public domain. Sharing is not necessarily giving.

(Though unfortunately in the Wild West Web of today it seems it does, practically speaking.)

murukesh_s · 2025-08-07T03:57:22 1754539042

Not always creators puts the content for free free, they expect views, gratitude, and/or ad revenue..

skeledrew · 2025-08-07T05:46:15 1754545575

It's not free if there's any kind of expected return.

rightbyte · 2025-08-07T06:35:15 1754548515

If the return is also free, yes. A smile or a thank you is free to give.

juliangmp · 2025-08-07T07:58:07 1754553487

Attribution would be one thing. See ooen-source and free software licenses.

skeledrew · 2025-08-07T08:32:42 1754555562

There are specific constraints on "free" in that context. For example *GPL also generally forbids providing functionality without making the source available.

AlienRobot · 2025-08-07T03:30:26 1754537426

Don't be surprised if there is no more free information in the next 10 years.

skeledrew · 2025-08-07T05:51:11 1754545871

There will always be people putting information out there with 0 expectation of any kind of return. What will happen is there will be far fewer websites enforcing return, as they'll be automatically deranked for that enforcement.

integralid · 2025-08-07T09:18:31 1754558311

I am tech blogging for free, and my expected return is people reading my content, recognition, attribution, social media attention and people discussing my content.

I have zero interest in providing free labor for LLM companies with no human actually reading my words. I don't think I'm alone in that stance.

Similarly, I used to help people on forums for free. My reward was getting respect of my peers, the feeling of helping another human being and sometimes them being grateful, rare side job opportunities thanks to people finding my specialist posts. That was fun, being anonymous question-answering bot for AI to scrape is not.

skeledrew · 2025-08-07T12:10:45 1754568645

> recognition, attribution, social media attention

Expectation of these in particular makes your blogging a product.

> with no human actually reading my words

But it - or at least the idea - is being ultimately read by humans, as long as some article is in the top results for relevance to some LLM prompts. It just may be summarized, or particular parts extracted first, or reworded for clarity (like I may ask for an "eli5" if I encounter something that's interesting but I find concepts going over my head), or otherwise preprocessed in order to fulfill the prompt parameters. All actions which the very human users may have to do manually anyway in order to efficiently consume your content (or give up and move on if they can't be bothered), is now automatically done by LLM agents.

AlienRobot · 2025-08-07T16:00:38 1754582438

The number of people who have the privilege to publish content for free for others (or spend their free time contributing to open source) is minimal. And now they will feel like they're being disrespected and exploited by LLM's and its users who think of the authors not as people but as just as the source of content. You can be sure most of the content will be gone because nobody wants to be just a content cow for LLM's.

skeledrew · 2025-08-07T23:01:43 1754607703

Why do you think they'll feel disrespected? They literally have no expectations; they're just putting content out there for others to do as they wish with. Several here have weighed in to say they don't care about LLMs accessing and morphing their content.

Of course they won't be nearly as many as those who publish with expectations, but also history has shown that whenever there's a gap, someone tends to fill it just because. I'm not worried about content gaps at all. What I see is an overall increase in quality as no-expectation sources float to the top of search results.

nicbou · 2025-08-08T08:30:23 1754641823

Well I'm one such person so you can hear it from me instead of guessing. I feel disrespected.

I can also confirm that I will do a lot less of it since it's threatening the parts of my business that supported me and gave me so much free time to release things for free. It almost halved my audience, so I have to do the same amount of work for nearly half the pay, half the community, and half the credit.

skeledrew · 2025-08-08T13:37:13 1754660233

You do have expectations though. You expect to grow and maintain an audience, gain credit, which makes what you're releasing a product, not something free.

nicbou · 2025-08-08T20:05:29 1754683529

Free as in beer, not as in public domain.

AlienRobot · 2025-08-08T15:56:45 1754668605

>whenever there's a gap, someone tends to fill it just because

lol what. That only works when the demand is actually paying. You are talking about FREE content!

>They literally have no expectations

I'm sorry but this is completely false.

If that were the case why do we have so many licenses? Why GPL/MIT/Apache, CC BY, CC BY-NC, CC BY-ND, CC BY-SA, CC BY-NC-SA, CC BY-NC-ND, and so on, when it's either "free or not"? Surely we don't need all this fluff when the only thing free for real is Public Domain?

Just consider this: we're living in an age where even people who publish MEMES on Reddit watermark them because they don't 9gag/Instagram/Facebook pages reposting it without permission/credit. And they are MEMES!! Even I find this cringe. But it proves that the author has some expectation. Even if you don't agree with their expectation, it proves that the expectation EXISTS.

What is next? Are you going to extend this to say that all web comics accessible for free on the web are "free free" so you should be allowed to remove watermarks to repost them on Facebook? You are filling the gap of having a single place where people can read funny comics for free, except you didn't make any of the comics and you have no right to post them. In fact, this is a great example. How is ChatGPT different from a guy that just reposts comics and memes on Facebook? It's literally the same thing.

And then next you are going to say that all videos posted on Youtube/TikTok are "free free" so you should be allowed to rip them off too.

I feel like you're just going to make an enemy out of everyone who publishes anything for free on the Internet if you start thinking like this.

skeledrew · 2025-08-08T23:33:58 1754696038

> That only works when the demand is actually paying

You should see the internet 20+ years ago. It was rich in forums, interest sites, etc where people just shared because they had this interesting thing they wanted to put out there. Reddit still has a bit of it, though it's mostly a mess now.

> why do we have so many licenses?

Because the way how the law works in some jurisdictions is that permission must be explicitly granted by the author via license, and some authors (who are aware of the legal requirement) just do up a thing without checking to see if there's something that fits their desires. And some want to tweak a license in some way to account for some other thing and end up creating a new license. See also CC0, 0BSD, WTFPL, Unlicense, and others [0].

BTW I'll clarify that it's not "expectations", but specifically "expectations of return". It's OK, and expected, for instance, that someone putting something out there for free wouldn't want to be held legally responsible if that thing is used in an illegal manner.

[0] https://en.m.wikipedia.org/wiki/Public_domain_equivalent_lic...

troyvit · 2025-08-07T14:30:51 1754577051

> The only people "losing out" here are people running SEO-spammish websites that themselves (at this point) are basically hosting LLM-generated answers for me to find.

And anybody who creates original content and wishes -- not just to be paid for that content -- but for people to actually see that content and engage with it. IOW the very people who fed the LLM revolution.

therealpygon · 2025-08-07T17:51:54 1754589114

No one is using AI search to read Tammy’s original content unless it is terribly written, and no one is reading the 4 page fluff peice about the process of troubleshoot XY just to find the command that fixes their problem when they just want to solve the problem.

People who want to engage with original content will; people just won’t be forced into appearing they are engaging with content just to find answers to simple questions or find the specific information they are looking for before they leave the site…like always.

The result is likely to be more time on site with lower numbers of users; a more genuine reflection of an actual user base instead of search-fed propping of ad revenue numbers through effectively fake impressions.

To that point: There is not a single website or blog, ever, that I started visiting regularly by having ended up there from a search result. Literally, ever. From something a friend shared? Something I saw on HN? Something I found through a recommendation, an article that was posted on a different site, etc? Absolutely.

troyvit · 2025-08-07T19:09:07 1754593747

That's ... not what original content makers who aren't named Tammy are saying, including CNN and The Verge [1], the Daily Mail [2], and several lawsuits [3] [4]. In fact, the only people who seem to believe that AI search isn't negatively affecting content makers long-term is google and you, and google (for now) has a vested interest in believing that.

Discovery isn't search, but search can be a form of discovery, despite your experiences. They don't match mine.

[1] https://www.npr.org/2025/07/31/nx-s1-5484118/google-ai-overv...

[2] https://digiday.com/media/the-winners-and-losers-of-googles-...

[3] https://www.reuters.com/legal/litigation/googles-ai-overview...

[4] https://www.reuters.com/legal/googles-ai-previews-erode-inte...

therealpygon · 2025-08-08T20:50:35 1754686235

Your excessive hyperbole aside, I’m sure they are saying such things. Unless you forgot, they also said that about Twitter…and Facebook…and Google Search…and before that it was magazines, and BBS systems, and the internet, and Craigslist, and…and…and…

You confuse original content creators with news conglomerates that always cry wolf about how they will be “put out of businesses”. Of course, ignoring the fact that there have never been more paid content creators in history, you choose to only be concerned about the one type who complains most frequently.

It’s really nothing new, but hey, benefit of the doubt, maybe it’s your first time.

troyvit · 2025-08-09T17:53:34 1754762014

Using the term "excessive hyperbole" is itself hyperbole and is a little unrealistic for what we're talking about. What you might be referring to is the excessive number of sources I used and honestly you could take a page from that to back up what you're saying.

My original comment regarded people who create original content and want to be paid for it. That includes substack creators, CNN, and lots of enterprises in between. They all have the same problem with large LLMs either taking advantage of the tragedy of the commons or ignoring their robots.txt files and scraping their content even if they choose to not participate.

I haven't forgotten that news organizations said the same thing about Twitter, Facebook, etc. If you haven't noticed, news (especially local news) has been declining steadily for at least the last 25 years and several news organizations (again, especially local ones) _have_ either gone out of business or been bought out and gutted by hedge funds. Some of this for sure is due to miscalculations by those orgs, but the nature of those miscalculations matters. It's worth reading up on the history of media's financial relationship with social and search. It will help inform a lot on how it's going to go with LLMs and AI unless they find a way to make some deals. It behooves both sides.

1vuio0pswjnm7 · 2025-08-07T17:39:08 1754588348

Alternatively

Wikipedia as search engine, no Javascript, no "AI"

Query string "wrench tight spaces"

Basin_wrench is the #1 result

usage: sh 1.sh wrench tight spaces > 1.htm;firefox ./1.htm

   #!/bin/sh
   ns=$(
   for x in \
   0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 \
   100 101 \
   118 119 \
   710 711 \
   828 829 \
   2300 2301 2302 2303
   do
   printf \&ns$x=1
   done
   )
   export x=https://en.wikipedia.org
   echo "url=$x/w/index.php?search=$@$ns&fulltext=1&offset=0" \
   |(echo user-agent=\"\" ;
     echo header Accept:;
     tr \\40 +) \
   |curl -K/dev/stdin \
   |(echo "<base href=$x />";cat)

1vuio0pswjnm7 · 2025-08-07T23:38:52 1754609932

"type of wrench for tight spaces"

Basin_wrench is #3 result

Delete preposition: "of"

"type wrench for tight spaces"

Basin_wrench is #2 result

Delete preposition: "for"

"type of wrench tight spaces"

Basin_wrench is #2 result

Delete prepositions: "of", "for"

"type wrench tight spaces"

Basin_wrench is #2 result

Delete unnecessary noun "type" and preposition "of"

"wrench for tight spaces"

Basin_wrench is #1 result

Delete preposition "for"

"wrench tight spaces"

Basin_wrench is #1 result

"tight spaces wrench"

Basin_wrench is #1 result

"tight wrench spaces"

Basin_wrench is #1 result

Much less typing, no unnecessary nouns and prepositions

I like it, others might not

I also tried a number of less popular search engines with a non-AI search from the command line

Query string "type of wrench for tight spaces"

For several of them the #1 result was for star-plumbing.com

https://star-plumbing.com/what-kind-of-wrench-is-used-in-tig...

armchairhacker · 2025-08-07T20:06:44 1754597204

That only works for some queries, particularly simple ones. Even elaborating the given query into “type of wrench for tight spaces” puts “basin wrench” in 6th place below incorrect wrenches like “oil filter” and “hydraulic torque” (https://en.m.wikipedia.org/w/index.php?search=type+of+wrench...).

1vuio0pswjnm7 · 2025-08-07T23:43:46 1754610226

That's not the same search URL used in the comment

The "ns" parameters are significant

If use the URL in the comment with this query string then Basin_wrench is #3 result

Monkey_wrench is #1 and Wrench is #2

what · 2025-08-07T04:59:13 1754542753

The LLM probably wouldn’t know the answer to your initial question without the SEO spam sites that they were trained on.

neuralRiot · 2025-08-08T01:18:27 1754615907

I don’t know if old school search engines are too ingrained in me but my search would look something like: reduced space “wrench” -site:www.amazon.com -site:ww.ebay.com

fennecfoxy · 2025-08-07T14:45:00 1754577900

Also have the same experience with regular Google search now (and other engines, too). I think part of the problem is the degradation of Google's product, but also the endless amount of absolute shit that humanity is vomiting onto the Internet now. The commercialisation of (almost) the entire Internet during the mid 2000s has wrecked it all.

Now AI searches that: search, pull various pages to examine real contents, continue searching etc, then summarise/answer is realistically the only way to filter through all of said bullshit.

AI searches help with the clickbait problem as well, since even "reputable" news outlets are engaging in that fuckery now.

It's either; we use AI to sift through the dead carcass of our old friend, or we enforce rules for a clean Internet - which I can definitely not see happening.

extr · 2025-08-05T19:49:24 1754423364

Nice release. Part of the problem right now with OSS models (at least for enterprise users) is the diversity of offerings in terms of:

- Speed

- Cost

- Reliability

- Feature Parity (eg: context caching)

- Performance (What quant level is being used...really?)

- Host region/data privacy guarantees

- LTS

And that's not even including the decision of what model you want to use!

Realistically if you want to use an OSS model instead of the big 3, you're faced with evalutating models/providers across all these axes, which can require a fair amount of expertise to discern. You may even have to write your own custom evaluations. Meanwhile Anthropic/OAI/Google "just work" and you get what it says on the tin, to the best of their ability. Even if they're more expensive (and they're not that much more expensive), you are basically paying for the priviledge of "we'll handle everything for you".

I think until providers start standardizing OSS offerings, we're going to continue to exist in this in-between world where OSS models theoretically are at performance parity with closed source, but in practice aren't really even in the running for serious large scale deployments.

coderatlarge · 2025-08-05T21:48:23 1754430503

true but ignores handing over all your prompt traffic without any real legal protections as sama has pointed out:

[1] https://californiarecorder.com/sam-altman-requires-ai-privil...

I_am_tiberius · 2025-08-06T01:32:17 1754443937

I wouldn't be surprised if those undeleted chats or some inferred data that is based on it is part of the gpt-5 training data. Somehow I don't trust this sama guy at all.

supermatt · 2025-08-05T22:01:03 1754431263

> OpenAI confirmed it has been preserving deleted and non permanent person chat logs since mid-Might 2025 in response to a federal court docket order

> The order, embedded under and issued on Might 13, 2025, by U.S. Justice of the Peace Decide Ona T. Wang

Is this some meme where “may” is being replaced with “might”, or some word substitution gone awry? I don’t get it.

SickOfItAll · 2025-08-10T19:39:21 1754854761

Clearly the author wrote the article with multiple uses of "may" and then used find/replace to change to "might" without proofreading.

wkat4242 · 2025-08-06T03:07:54 1754449674

Yeah noticed this too. Really weird for a professional publication

kekebo · 2025-08-05T22:22:00 1754432520

:)) Apparently. I don't have a better guess. Well spotted

beowulfey · 2025-08-06T15:50:20 1754495420

auto correct gone awry

mattmaroon · 2025-08-06T12:38:00 1754483880

Or May in another language?

davidron · 2025-08-08T01:36:54 1754617014

Or non native English speaker who pronounces "may" the same as "might" and didn't realize the difference?

It is maybe not coincidental that "may" and "might" mean nearly the same thing which bolsters the case for auto correct gone awry.

wkat4242 · 2025-08-06T03:05:15 1754449515

Gpt-oss comes only in 4.5 bit quant. This is the native model, so there's no fp16 original

extr · 2025-07-30T22:21:47 1753914107

IMO, the time of "code as math" is over. No sufficiently large software system that interacts with the real world is provable to be correct like a mathematical statement is. They are all complicated, engineered systems that are backed by a mix of formal guarantees, earned design principals, experimental testing, rules of thumb, acceptable performance envelopes, etc

This is what all software will become, down to the smallest script. The vast majority of software does not need to be provably correct in a mathematical way. It just needs to get the job done. People love the craft of programming, so I get it, it's uncomfortable to let go.

But what is going to win out in the end:

- An unreadable 100K loc program backed by 50K tests, guaranteeing behavior to the client requirements. Cost: $50K of API tokens

- A well engineered and honed 30K loc program, built by humans, with elegant abstractions. Backed by 3K tests. Built to the same requirements. Cost: $300K of developer time.

If I am a consumer of software, and not particularly interested in the details, I am going to choose the option that is 6x cheaper, every time.

skydhash · 2025-07-30T22:34:41 1753914881

> An unreadable 100K loc program backed by 50K tests, guaranteeing behavior to the client requirements

Until the next set of needed changes due to exterior requirements. And this software is one of the pillar in the business. That is when you switch vendors if you were buying the service.

That is why support is always an essential part of B2B or even serious B2C. The world will change and you need to react to it, not just have the correct software now.

lifeformed · 2025-07-31T04:28:01 1753936081

This assumes software is a thing you build once and seal it off when it's finished.

What happens when you need to modify large portions of it? Fix security issues? Scale it up 20x? You can throw more tokens at it and grow it into a monstrous hulk. What if performance degrades due to its sheer weight?

I know humans aren't perfect and are capable of writing really bad unmaintainable code too, but this is just embracing that more. This feels like going further down the same route of how we ended up with 10MB websites that take many seconds to load. But yeah it will probably win over the market.

lmm · 2025-07-31T04:38:50 1753936730

> An unreadable 100K loc program backed by 50K tests, guaranteeing behavior to the client requirements. Cost: $50K of API tokens

As my team has spent the past several months trying to explain to upper management, you can't guarantee that the program does what the client wanted just by adding more tests.

If the AIs ever become capable of reliably producing what the client wanted, they will win. But I'm not convinced they will. They might be able to produce what the client asked for, but programmers have known for decades that that's pretty much useless.

kidbomb · 2025-07-30T22:38:36 1753915116

// When I wrote this code, only Copilot and I understood what I did. Now only Copilot knows.

dehrmann · 2025-07-31T00:33:46 1753922026

> No sufficiently large software system that interacts with the real world is provable to be correct like a mathematical statement is.

People who work in formal verification will either vehemently disagree with you or secretly know you're right.

trip-zip · 2025-07-30T22:33:16 1753914796

> guaranteeing behavior to the client requirements

> built by humans, with elegant abstractions

Frankly, I look at both of these options and think I haven't seen either in the wild...

emehex · 2025-07-30T22:39:45 1753915185

I think the question to ask about your two scenarios: in which is it faster and cheaper to get from v1 to v2? From v2 to v3? I think, for right now, it's cheaper under scenario B. But in the future? Who knows!

oc1 · 2025-07-31T10:51:58 1753959118

> Cost: $50K of API tokens

What? It costs exactly $200

extr · 2025-07-30T17:19:59 1753895999

I have a JackRabbit OG e-bike [0] (really technically an electric scooter, since it doesn't have pedals). I recently put an aftermarket controller on it to allow for speeds past 20mph, and to allow for higher current to the 350W motor (accepting the risk of increased wear and burn out). It's a ton of fun to ride, can get up to 30-35mph or so though I never take it that high (dangerous). I mostly just use the increased torque for hills.

What's interesting to me though is when I lived in a city, there was zero enforcement of e-bike laws, classifications, etc. I never saw a cop glance my way. Of course, the only riders of e-bikes were adults, and people generally followed traffic laws.

Now that I live in the suburbs, the only other riders of e-bikes are teens and it's a huge issue! I have to be careful exceeding my speed class, it's noticable how much power my apparently modest bike has. Cops have already stopped me once to ask more about the bike, and accepted my explanation that while it's powerful, it has a software governer to keep it within limits. The cop seemed to give me an easier time because of my age (in my 30s), and the fact that I'm an adult with a regular driver's license. I got the sense if they had caught a teen with the bike, they would have been ticketed and the bike impounded.

[0] https://jackrabbit.bike/

ghaff · 2025-07-30T17:36:58 1753897018

When I was discussing the issue with someone in a nearby city that is putting in additional protected bike lanes, she said that they asked about enforcement of high-speed ebike speeds in a meeting and the city official basically shrugged and said nope.

ryandvm · 2025-07-30T17:54:27 1753898067

Can confirm. Teenagers on e-bikes is easily the number one complaint on suburban Nextdoor right now. It's like catnip for boomers.

extr · 2025-07-24T17:07:29 1753376849

I'm not saying the thesis here is wrong - it definitely feels directionally correct. But I always wonder with these panics about children's development - do we actually know that developing certain types of skills at certain periods is critical, in the long run?

I think about this often with my own kid. He was behind on speech milestones. I looked up outcomes for kids behind on speech milestones. Data shows that early intervention seems to help speech/reading skills into early elementary school. But there was very little data on the longer term outcomes. Does speech therapy when you're 2 or 3 years old really impact your career or lifetime earnings? Seems like it might for kids with true developmental/learning disabilities. But everyone else?

As our pediatrician often says "[Outside of severe disability] Nobody goes off to college needing their mommy to sleep, or being unable to use a fork".

Animats · 2025-07-24T17:52:21 1753379541

Reaching college without being able to use a screwdriver is a problem I've heard mentioned by a mechanical engineering professor.

General Motors runs new hires through basic training for an assembly line, with a dummy assembly line and wooden car mockups.[1]

[1] https://www.youtube.com/watch?v=d1XVgGT4Eqo

Exoristos · 2025-07-24T17:41:54 1753378914

How many industry leaders do you know with speech impediments? And, no, these impediments don't usually vanish over time -- I come from a large family (8 kids), and one of my youngest sisters was allowed to "grow out" of her inability to pronounce 's' and 'r' -- but it never happened, and she required speech therapy at 20. She still struggles. Parents are correct to pay attention to young-childhood milestones.

nonameiguess · 2025-07-24T17:36:58 1753378618

At least this one is not. I've broken my dominant hand before and then separated the same shoulder, both in the same year, and was surprised to find using scissors to be the single most difficult thing to do with my non-dominant hand, other than writing. It took a few months to get it right, but that's it. Not like it's a lifelong deficiency. You just learn later. Presumably, kids with their inherently more plastic neuromuscular systems learn faster than a 44 year-old, too.

SoftTalker · 2025-07-24T18:54:27 1753383267

It's because scissors are actually made for use in one hand or the other. Most are made for right-handed people. If you're left-handed (or forced to use your left hand), get scissors made for left-handed people and they will be easier to use.

NoMoreNicksLeft · 2025-07-24T17:53:27 1753379607

>But I always wonder with these panics about children's development

The panics are definitely overblown, if not entirely wrong... soon, there won't be (m)any children to worry about, and then the human race becomes extinct (or maybe just feral). A glorious future awaits.

extr · 2025-07-18T23:44:37 1752882277

Have been using this for awhile, I'm on the $100/mo Max plan and have been running $600-800/mo in terms of usage, and I'm hardly pushing it to the limits (missing lots of billing windows).

It makes me wonder what Anthropic's true margins are. I could believe they are overcharging via the API, Sonnet is $3/$15/Mtok and Opus at an ABSURD $15/$75/Mtok. But to break even for me, that would mean that they're overcharging by a factor 5x-10x, which doesn't seem possible. Is the music going to stop for Claude Code the same way it did for Cursor? I have to imagine every incentive in the world is pushing them to lower inference cost rather than introduce stricter limits, and unlike Cursor they can actually can reach into their stack and do this. But I'm not sure they're capable of miracles.

Regardless, I'm bullish Anthropic. Sonnet and Opus don't benchmark as well as O3/Grok4 at pure coding, and aren't as cheap as Kimi K2 for theoretically similar perf, but as any user knows they are top tier at instruction following, highly reliable and predictable, and have a certain intangible theory of mind that is unique to Anthropic.

sothatsit · 2025-07-19T00:25:38 1752884738

> but as any user knows they are top tier at instruction following, highly reliable and predictable

This is spot on. Reliability is really the #1 priority for me when it comes to coding agents, and Sonnet, and especially Opus, really deliver on it. It makes such a huge difference when it comes to agents. Anthropic really nailed it on this.

My process has become: get Opus to generate a plan, use o3 to help me review the plan, and then get Opus to implement the plan. This works extremely well for me, and is the first time where I've felt AI being actually useful for coding anything more than small prototypes.

buremba · 2025-07-19T00:42:43 1752885763

How do you switch to o3 to review the plan?

sothatsit · 2025-07-19T01:24:06 1752888246

I have a workflow that tells Claude Code to generate a planning markdown document: https://gist.github.com/Sothatsit/c9fcbcb50445ebb6f367b0a6ca...

extr · 2025-07-19T01:23:10 1752888190

Personally, I use Repoprompt for this + deeper context integration.

shmoogy · 2025-07-19T04:53:33 1752900813

I use zenMCP to collaborate with opus

drakenot · 2025-07-19T01:01:38 1752886898

Have CC output a plan.md file.

jmaker · 2025-07-20T14:23:03 1753021383

> but as any user knows they are top tier at instruction following, highly reliable and predictable, and have a certain intangible theory of mind that is unique to Anthropic.

I’m not such an “any user” as in my experience they are exceptionally terrible at following my instructions and cutting edges whenever they like, I have to iteratively get more and more specific at what they should not do, all the time trying to predict their misbehavior. To then see a certain point or a threshold at which they throw my instructions out the window to match and reproduce the very things or patterns I wanted to avoid and likewise with the things and patterns I explicitly instructed them to use or apply.

Very unreliable.

> have a certain intangible theory of mind that is unique to Anthropic.

That’s quite poetic.

nidnogg · 2025-07-19T01:01:57 1752886917

What do you mean by music stopping for Cursor? Almost every single developer I run into is transitioning/transitioned to it today. It's stinks like the new VS Code.

extr · 2025-07-19T01:23:48 1752888228

Their pricing change recently has people reaching for alternatives.

ghuntley · 2025-07-19T01:09:36 1752887376

There's a predictable journey: people start with Cursor when they are new to AI, and quickly move on to something more powerful once they realise that the IDE [1] is holding em back and that forking VSCode is [2][3] tech-debt.

[1] https://ghuntley.com/overton

[2] https://ghuntley.com/fracture

[3] https://ghuntley.com/amazon-kiro-source-code/

wyldfire · 2025-07-18T23:55:44 1752882944

> Sonnet and Opus don't benchmark as well as O3/Grok4 at pure coding

Do any of the others have a "claude code" local agent? Seems like a big gap IMO. Though, it should be pretty easy for them to close that gap.

I don't usually take too many moral stances but I feel like I can't use Grok. It's bad enough Musk did his Nazi salute but his AI product itself is a Nazi too? It might be good at coding but I really can't stomach using it.

sothatsit · 2025-07-19T00:34:47 1752885287

FWIW, people report that Grok 4 is not very good at coding, and xAI admit this themselves when they said they will be releasing a separate coding model in "the next few weeks".

Also, Google does have Gemini CLI, OpenAI does have Codex CLI, and then there is Aider which can support any model. I think the big difference is that Anthropic's models are the best for this use-case right now, and Anthropic has the Max plan which makes a massive difference to the cost of using Claude Code compared to competitors (although the Gemini CLI has insane free tiers).

I'm not sure how this will play out in the future, because it seems to me that Claude Code does not have much of a moat beyond Anthropic having the best coding models right now, and them offering model usage at heavily discounted prices.

ghuntley · 2025-07-19T01:06:57 1752887217

> people report that Grok 4 is not very good at coding

There are agentic models and oracle models. It can be modelled on a four-way quadrant of agent vs oracle and high safety vs low safety.

https://ghuntley.com/cars

Grok is oracle and low safety.

theshrike79 · 2025-07-20T13:15:17 1753017317

Grok4 is pretty decent at planning and figuring out libraries and APIs.

For code it falls down past simple scripts and utilities.

perfmode · 2025-07-19T00:02:15 1752883335

perhaps there are enough inactive subscribers to compensate for the heavy users.

Aeolun · 2025-07-19T06:54:43 1752908083

I used it really heavily for a few days, but now I just don’t have the time to send much instructions. Still used on the order of $1000 in the first month, but imagine it will go down as time goes on. CC is so convenient I think I’d have a hard time living without it though

buremba · 2025-07-19T00:48:34 1752886114

I doubt that there are many inactive subscribers, as Claude Code / Max Plan is relatively new. They might be hoping that way in a couple of months, though.

swyx · 2025-07-19T00:32:41 1752885161

any dropoff in usage limits for you recently? https://techcrunch.com/2025/07/17/anthropic-tightens-usage-l...

cant help but think that you guys yelling about it so loudly from the rooftops is really really not helping your case lol

ghuntley · 2025-07-19T01:17:21 1752887841

Exactly, swyx. Any flat rate pricing plan is effectively a bet against the future. It's a grab for engineers that's subsidised. Now, the problem is that GPUs are expensive; they are a costly resource to use. Inferencing is expensive.

So what happens is inevitable:

- Wild promises of unlimited usage and consumers feeling tricked when the impossible is impossible to deliver (Cursor pricing changes).

- Quasi-unlimited usage with rate-caps, but the models get quantised to all hell? [search Twitter for folks reporting Claude feels dumber around/near outages].

- Engineers sharing tools and techniques on how to squeeze pounds out of a flat-rate plan (original post), which results in more power users doing that, which puts more pressure on margins.

In goose meme format, "What are the margins?"

https://x.com/GeoffreyHuntley/status/1945636266009399414

kristianp · 2025-07-19T03:14:50 1752894890

> Any flat rate pricing plan is effectively a bet against the future

How quickly we forget Moore's law, or at least what has replaced Moore's law.

core-utility · 2025-07-19T00:41:13 1752885673

I started on the Pro plan a week ago and was already contemplating jumping to Max. When I hit a limit yesterday I upgraded to Max and hit a limit again before seeing the news of the changed usage limit.

For what it's worth, everything seems fixed today.

extr · 2025-07-19T02:12:45 1752891165

People have been saying this since it first came out. I don’t doubt there are occasional bugs/service disruptions but personally I really doubt Anthropic is silently decreasing the limits.

extr · 2025-07-18T17:24:19 1752859459

I think for me the hesitation is in that to get started on task XYZ, the first step is often some extremely boring task ABC that is only tangentially related to the "real" task.

For example, the other day I needed to find which hook to use in a 2300 loc file of hooks. AI found me the hook in 5 seconds and showed me how to use it. This is a pure win - now that I have the name of the hook I can go read it in 30 seconds and verify it's what I wanted. If it's not, I can ask again. There's zero risk here.