More

WhyOhWhyQ · 2025-04-05T18:45:10 1743878710

RIP to those generous souls

WhyOhWhyQ · 2025-04-05T18:43:00 1743878580

This is how I am except with nostalgia content, which I cannot see as just content.

It is however impossible for me to play the latest games or watch the latest shows for 10 minutes without feeling like my time is being wasted.

WhyOhWhyQ · 2025-04-04T12:07:07 1743768427

Thank you person. You have improved my day.

WhyOhWhyQ · 2025-04-03T23:29:12 1743722952

Your position cannot distinguish stealing somebody's likeness and looking at them.

shadowgovt · 2025-04-04T14:47:12 1743778032

I agree without argument. I have also thoroughly enjoyed the animatronic dead Presidents at Disney World.

WhyOhWhyQ · 2025-04-03T23:18:24 1743722304

We're about to witness a fundamental shift in the human experience. Some time in the near future there will not be a single act of creation you can do that isn't trivial compared to the result of typing "make cool thing please now" into the computer. And your position is to add to the problem because with your policy anything I create should get chucked into the LLM grinder by any and everybody. How do I get my human body to commit to doing hard things with that prospect at hand? This is the end of happiness.

redwood · 2025-04-03T23:25:34 1743722734

This is why I love making bread

GPerson · 2025-04-03T23:45:56 1743723956

We can’t all be bread making hedonists. Some of us want these finite lives to mean more than living constantly in the moment in a state of baking zen.

card_zero · 2025-04-04T00:04:22 1743725062

I don't know, that sounds like the basic argument for copyright: "I created a cool thing, therefore I should be able to milk it for the rest of my life". Without this perk, creatives are less motivated. Would that be bad? I guess an extreme version would be a world where you can only publish anonymously and with no tangible reward.

jkhdigital · 2025-04-04T00:22:10 1743726130

I hate to paint with such a broad brush, but I’d venture that “creatives” are not primarily motivated by profit. It is almost a truism that money corrupts the creative endeavour.

card_zero · 2025-04-04T01:07:28 1743728848

There are various ways to turn creativity into money, even without publishing any kind of artwork. Basically all skilled jobs and entrepreneurial enterprises require creativity. And if you do have an artwork, you can still seek profit through acclaim, even without copyright: interviews, public appearances. Artists once had patrons - but that tends to put aristocrats in control of art.

So money will motivate a lot of the creativity that goes on.

Meanwhile, if you dabble in some kind of art or craft while working in a factory to make ends meet, that kind of limits you to dabbling, because you'll have no time to do it properly. Money also buys equipment and helpers, sometimes useful.

On the other hand, yes, it ruins the art. There's a 10cc song about that. https://en.wikipedia.org/wiki/Art_for_Art%27s_Sake_(song)

Though, this reminds me of an interesting aside: the origin of the phrase "art for art's sake" was not about money, but about aesthetics. It meant something like "stop pushing opinions, just show me a painting".

WhyOhWhyQ · 2025-04-05T11:26:11 1743852371

All mediums for creativity are not equivalent. We're killing off the good ones and replacing them with bad ones.

WhyOhWhyQ · 2025-04-01T17:08:56 1743527336

It doesn't seem hard to imagine 2-5 years from now when "memes in Ghibli style" turns into "pay us 25 cents and we'll send you a 30 minute cartoon in Ghibli style".

WhyOhWhyQ · 2025-03-28T11:23:23 1743161003

The technology is not just less than superintelligence, for many applications it is less than prior forms of intelligence like traditional search and Stack Exchange, which were easily accessible 3 years ago and are in the process of being displaced by LLMs. I find that outcome unimpressive.

And this Tweeter's complaints do not sound like a demand for superintelligence. They sound like a demand for something far more basic than the hype has been promising for years now. - "They continue to fabricate links, references, and quotes, like they did from day one." - "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error." (Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.) - "They reference a scientific publication, I look it up, it doesn't exist." - "I have tried Gemini, and actually it was even worse in that it frequently refuses to even search for a source and instead gives me instructions for how to do it myself." - "I also use them for quick estimates for orders of magnitude and they get them wrong all the time. " - "Yesterday I uploaded a paper to GPT to ask it to write a summary and it told me the paper is from 2023, when the header of the PDF clearly says it's from 2025. "

Thlom · 2025-03-28T11:55:57 1743162957

A municipality in Norway used LLM to create a report about the school structure in the municipality (how many schools are there, how many should there be, where should they be, how big should they be, pros and cons of different size schools and classes etc etc). Turns out the LLM invented scientific papers to use as references and the whole report is complete and utter garbage based on hallucinations.

brookst · 2025-03-28T12:15:33 1743164133

And that says… what? The entire LLM technology is worthless for all applications, from all implementations?

A company I worked for spent millions on a customer service solution that never worked. I wouldn’t say that contracted software is useless.

icepat · 2025-03-28T13:24:04 1743168244

I agree. I use LLMs heavily for gruntwork development tasks (porting shell scripts to Ansible is an example of something I just applied them to). For these purposes, it works well. LLMs excel in situations where you need repetitive, simple adjustments on a large scale. IE: swap every postgres insert query, with the corresponding mysql insert query.

A lot of the "LLMs are worthless" talk I see tends to follow this pattern:

1. Someone gets an idea, like feeding papers into an LLM, and asks it to do something beyond its scope and proper use-case.

2. The LLM, predictably, fails.

3. Users declare not that they misused the tool, but that the tool itself is fundamentally corrupted.

It in my mind is no different to the steam roller being invented, and people remaking how well it flattens asphalt. Then a vocal group trying to use this flattening device to iron clothing in bulk, and declaring steamrollers useless when it fails at this task.

replyifuagree · 2025-03-28T14:15:31 1743171331

>swap every postgres insert query, with the corresponding mysql insert query.

If the data and relationships in those insert queries matter, at some unknown future date you may find yourself cursing your choice to use an LLM for this task. On the other hand you might not ever find out and just experience a faint sense of unease as to why your customers have quietly dropped your product.

babyent · 2025-03-28T15:43:20 1743176600

I hope people do this and royally mess shit up.

Maybe then they’ll snap out of it.

I’ve already seen people completely mess things up. It’s hilarious. Someone who thinks they’re in “founder mode” and a “software engineer” because chatgpt or their cursor vomited out 800 lines of python code.

brookst · 2025-03-28T17:33:46 1743183226

The vileness of hoping people suffer aside, anyone who doesn’t have adequate testing in place is going to fail regardless of whether bad code is written by LLMs or Real True Super Developers.

babyent · 2025-03-28T18:27:19 1743186439

What vileness? These are people who are gleefully sidestepping things they don't understand and putting tech debt onto others.

I'd say maybe up to 5-10 years ago, there was an attitude of learning something to gain mastery of it.

Today, it seems like people want to skip levels which eventually leads to catastrophic failure. Might as well accelerate it so we can all collectively snap out of it.

icepat · 2025-03-28T17:42:02 1743183722

The mentality you're replying to confuses me. Yes, people can mess things up pretty badly with AI. But I genuinely don't understand why the assumption that anyone using AI is also not doing basic testing, or code review.

actinium226 · 2025-03-28T15:26:24 1743175584

Probably better to have AI help you write a script to translate postgres statements to mysql

icepat · 2025-03-28T17:30:55 1743183055

Right, which is why you go back and validate code. I'm not sure why the automatic assumption that implementing AI in a workflow means you blindly accept the outputs. You run the tool, you validate the output, and you correct the output. This has been the process with every new engineering tool. I'm not sure why people assume first that AI is different, and second that people who use it are all operating like the lowest common denominator AI slop-shop.

snackernews · 2025-03-29T01:11:31 1743210691

In this analogy are all the steamroller manufacturers loudly proclaiming how well it 10x the process of bulk ironing clothes?

And is a credulous executive class en masse buying into that steam roller industry marketing and the demos of a cadre of influencer vibe ironers who’ve never had to think about the longer term impacts of steam rolling clothes?

freedomben · 2025-03-28T16:47:23 1743180443

> porting shell scripts to Ansible

Thank you for mentioning that! What a great example of something an LLM can pretty well do that otherwise can take a lot of time looking up Ansible docs to figure out the best way to do things. I'm guessing the outputs aren't as good as someone real familiar with Ansible could do, but it's a great place to start! It's such a good idea that it seems obvious in hindsight now :-)

icepat · 2025-03-28T17:34:04 1743183244

Exactly, yeah. And once you look over the Ansible, it's a good place to start and expand. I'll often have it emit hemlcharts for me as templates, then after the tedious setup of the helm chart is done, the rest of it is me manually doing the complex parts, and customizing in depth.

fragmede · 2025-03-28T17:47:34 1743184054

Plus, it's a generic question; "give a helm chart for velero that does x y and z" is as proprietary as me doing a Google search for the same, so you're not giving proprietary source code to OpenAI/wherever so that's one fewer thing to worry about.

icepat · 2025-03-28T17:54:55 1743184495

Yeah, I tend to agree. The main reason that I use AI for this sort of stuff is it also gives me something complete that I can then ask questions about, and refine myself. Rather than the fragmented documentation style "this specific line does this" without putting it in the context of the whole picture of a completed sample.

I'm not sure if it's a facet of my ADHD, or mild dyslexia, but I find reading documentation very hard. It's actually a wonder I've managed to learn as much as I have, given how hard it is for me to parse large amounts of text on a screen.

Having the ability to interact with a conversational type documentation system, then bullshit check it against the docs after is a game changer for me.

fragmede · 2025-03-28T18:01:07 1743184867

that's another thing! people are all "just read the documentation". the documentation goes on and on about irrelevant details, how do people not see the difference between "do x with library" -> "code that does x", and having to read a bunch of documentation to make a snippet of code that does the same x?

icepat · 2025-03-28T18:06:21 1743185181

I'm not sure I follow what you mean, but in general yes. I do find "just read the docs" to be a way to excuse not helping team members. Often docs are not great, and tribal knowledge is needed. If you're in a situation where you're either working on your own and have no access to that, or in a situation where you're limited by the team member's willingness to share, then AI is an OK alternative within limits.

Then there's also the issue that examples in documentation are often very contrived, and sometimes more confusing. So there's value in "work up this to do such and such an operation" sometimes. Then you can interrogate the functionality better.

nancyminusone · 2025-03-28T14:30:16 1743172216

No, it says that people dislike liars. If you are known for making up things constantly, you might have a harder time gaining trust, even if you're right this time.

sswatson · 2025-03-28T14:43:20 1743173000

All of these things can be true at the same time:

1. LLMs have been massively overhyped, including by some of the major players.

2. LLMs have significant problems and limitations.

3. LLMs can do some incredibly impressive things and can be profoundly useful for some applications.

I would go so far as to say that #2 and #3 are hardly even debatable at this point. Everyone acknowledges #2, and the only people I see denying #3 are people who either haven't investigated or are so annoyed by #1 that they're willing to sacrifice their credibility as an intellectually honest observer.

absolutelastone · 2025-03-28T15:56:21 1743177381

#3 can be true and yet not be enough to make your case. Many failed technologies achieved impressive engineering milestones. Even the harshest critic could probably brainstorm some niche applications for a hallucination machine or whatever.

fragmede · 2025-03-28T18:11:15 1743185475

And yet we keep electing them to public office.

mywittyname · 2025-03-28T21:24:27 1743197067

It says that people need training on what the appropriate use-cases for LLMs are.

This is not the type of report I'd use an LLM to generate. I'd use a database or spreadsheet.

Blindly using and trusting LLMs is a massive minefield that users really don't take seriously. These mistakes are amusing, but eventually someone is going to use an LLM for something important and hallucinations are going to be deadly. Imagine a pilot or pharmacist using an LLM to make decisions.

Some information needs to come from authoritative sources in an unmodified format.

knowitnone · 2025-03-28T15:03:49 1743174229

If it makes data up, then it is worthless for all implementations. I'd rather it said I don't have info on this question.

anamexis · 2025-03-28T15:14:20 1743174860

It only makes it worthless for implementations where you require data. There's a universe of LLM use cases that aren't asking ChatGPT to write a report or using it as a Google replacement.

timacles · 2025-03-28T16:01:10 1743177670

The problem is that yes llms are great when working on some regular thing for the first time. You can get started at a speed never before seen in the tech world.

But as soon as your use case goes beyond that LLMs are almost useless.

The main complaint that yes its extremely helpful in that specific subset of problems, it’s not actually pushing human knowledge forward. Nothing novel is being created with it.

It has created this illusion of being extremely helpful when in reality it is a shallow kind of help.

tasuki · 2025-03-28T20:30:21 1743193821

> If it makes data up, then it is worthless for all implementations.

Not true. It's only worthless for the things you can't easily verify. If you have a test for a function and ask an LLM to generate the function, it's very easy to say whether it succeeded or not.

In some cases, just being able to generate the function with the right types will mostly mean the LLM's solution is correct. Want a `List(Maybe a) -> Maybe(List(a))`? There's a very good chance a LLM will either write the right function or fail the type check.

joquarky · 2025-03-28T16:13:06 1743178386

> all implementations

Are you speaking for yourself or everyone?

brookst · 2025-03-28T17:34:19 1743183259

Does “it” apply to Homo sapiens as well?

px1999 · 2025-03-28T22:34:28 1743201268

Except value isnt polarised like that.

In a research context, it provides pointers, and keywords for further investigation. In a report-writing context it provides textual content.

Neither of these or the thousand other uses are worthless. Its when you expect working and complete work product that it's (subjectively, maybe) worthless but frankly aiming for that with current gen technology is a fool's errand.

dmichulke · 2025-03-28T13:33:15 1743168795

It says we don't have a lower bound on the effectiveness.

It's (currently) like an ad saying "this product can improve your stuff up to 300%"

rzwitserloot · 2025-03-28T13:30:04 1743168604

It mostly says that one of the seriously difficult challenges with LLMs is a meta-challenge:

* LLMs are dangerously useless for certain domains.

* ... but can be quite useful for others.

* The real problem is: They make it real tricky to tell, because most of all they are trained to sound professional and authoritative. They hallucinate papers because that's what authoritative answers look like.

That already means I think LLMs are far less useful than they appear to be. It doesn't matter how amazing a technology is: If it has failure modes and it is very difficult to know what they are, it's dangerous technology no matter how awesome it is when it is working well. It's far simpler to deal with tech that has failure modes but you know about them / once things start failing it's easy to notice.

Add to it the incessant hype, and, oh boy. I am not at all surprised that LLMs have a ridiculously wide range as to detractors/supporters. Supporters of it hype the everloving fuck out of it, and that hype can easily seem justified due to how LLMs can produce conversational, authoritative sounding answers that are explicitly designed to make your human brain go: Wow, this is a great answer!

... but experts read it and can see the problems there. Which lots of tech suffers from: as a random example: Plenty of highly upvoted apparently fantastically written Stack Overflow answers have problems. For example, it's a great answer... for 10 years ago; it is a bad idea today because the answer has been obsoleted.

But between the fact that it's overhyped and particularly complex to determine an LLM answer is hallucinated drivel, it's logical to me that experts are hyperbolic when highlighting the problems. That's a natural reaction when you have a thing that SEEMS amazing but actually isn't.

fragmede · 2025-03-28T18:29:48 1743186588

> Stack Overflow answers have problems. For example, it's a great answer... for 10 years ago

To be fair, that's a huge problem with stack overflow and its culture. A better version of stack overflow wouldn't have that particular issue.

svrtknst · 2025-03-28T13:07:32 1743167252

You, and the OP, are being unfair in your replies. Obviously, it's not worthless for all applications but when LLMs obviously fail in disastrous ways in some important areas, you can't refute that by going "actually it gives me codign advice and generates images".

Thats nice and impressive, but there are still important issues and shortcomings. Obligatory, semirelated xkcd: https://xkcd.com/937/

michaelcampbell · 2025-03-29T17:05:22 1743267922

> And that says… what? The entire LLM technology is worthless for all applications, from all implementations?

You're the first in the thread to have brought that up; there are far more charitable ways to have interpreted the post you're replying to.

camillomiller · 2025-03-28T20:17:56 1743193076

That software just didn‘t work that way. I don’t think it tried to convince the users that they were wrong by spouting nonsense that seems legitimate.

mwigdahl · 2025-03-28T13:42:56 1743169376

All of these anecdotal stories about "LLM" failures need to go into more detail about what model, prompt, and scaffolding was used. It makes a huge difference. Were they using Deep Research, which searches for relevant articles and brings facts from them into the report? Or did they type a few sentences into ChatGPT Free and blindly take it on faith?

LLMs are _tools_, not oracles. They require thought and skill to use, and not every LLM is fungible with every other one, just like flathead, Phillips, and hex-head screwdrivers aren't freely interchangeable.

spamizbad · 2025-03-28T13:49:07 1743169747

If any non-trivial ask of an LLM also requires the prompts/scaffolding to be listed, and independently verified, along with its output, their utility is severely diminished. They should be saving time not giving us extra homework.

Far better to just get these problems resolved.

mwigdahl · 2025-03-28T14:17:54 1743171474

That isn't what I'm saying. I'm saying you can't make a blanket statement that LLMs in general aren't fit for some particular task. There are certainly tasks where no LLM is competent, but for others, some LLMs might be suitable while others are not. At least some level of detail beyond "they used an LLM" is required to know whether a) there was user error involved, or b) an inappropriate tool was chosen.

butlike · 2025-03-28T16:11:13 1743178273

then they shouldn't market it as one-size fits all

mwigdahl · 2025-03-28T16:30:27 1743179427

Are they? Every foundation model release includes benchmarks with different levels of performance in different task domains. I don't think I've seen any model advertised by its creating org as either perfect or even equally competent across all domains.

The secondary market snake oil salesmen <cough>Manus</cough>? That's another matter entirely and a very high degree of skepticism for their claims is certainly warranted. But that's not different than many other huckster-saturated domains.

TexanFeller · 2025-03-28T17:39:09 1743183549

People like Zuckerberg go around claiming most of their code will be written by AI starting sometime this year. Other companies are hearing that and using it as a reason(or false cover) for layoffs. The reality is LLMs still have a way to go before replacing experienced devs and even when they start getting there there will be a period of time where we’re learning what we can and can’t trust them with and how to use them effectively and responsibly. Feels like at least a few years from now, but the marketing says it’s now.

some_random · 2025-03-28T15:46:16 1743176776

In many, many cases those problems are resolved by improvements to the model. The point is that making a big deal about LLM fuck ups in 3 year old models that don't reproduce in new ones is a complete waste of time and just spreads FUD.

actinium226 · 2025-03-28T15:49:01 1743176941

Did you read the original tweet? She mentions the models and gives high level versions of her prompts. I'm not sure what "scaffolding" is.

You're right that they're tools, but I think the complaint here is that they're bad tools, much worse than they are hyped to be, to the point that they actually make you less efficient because you have to do more legwork to verify what they're saying. And I'm not sure that "prompt training," which is what I think you're suggesting, is an answer.

I had several bad experiences lately. With Claude 3.7 I asked how to restore a running database in AWS to a snapshot (RDS, if anyone cares). It basically said "Sure, just go to the db in the AWS console and select 'Restore from snapshot' in the actions menu." There was no such button. I later read AWS docs that said you cannot restore a running database to a snapshot, you have to create a new one.

I'm not sure that any amount of prompting will make me feel confident that it's finally not making stuff up.

mwigdahl · 2025-03-28T16:24:54 1743179094

I was responding to the "they used an LLM" story about the Norwegian school report, not the original tweet. The original tweet has a great level of detail.

I agree that hallucination is still a problem, albeit a lot less of one than it was in the recent past. If you're using LLMs for tasks where you are not directly providing it the context it needs, or where it doesn't have solid tooling to find and incorporate that context itself, that risk is increased.

tempfile · 2025-03-28T13:52:44 1743169964

Why do you think these details are important? The entire point of these tools is that I am supposed to be able to trust what they say. The hard work is precisely to be able to spot which things are true and false. If I could do that I wouldn't need an assistant.

barnabee · 2025-03-28T16:47:48 1743180468

> The entire point of these tools is that I am supposed to be able to trust what they say

Hard disagree, and I feel like this assumption might be at the root of why some people seem so down on LLMs.

They’re a tool. When they’re useful to me, they’re so useful they save me hours (sometimes days) and allow me to do things I couldn’t otherwise, and when they’re not they’re not.

It never takes me very long to figure out which scenario I’m in, but I 100% understand and accept that figuring that out is on me and part of the deal!

Sure if you think you can “vibe code” (or “vibe founder”) your way to massive success but getting LLMs to do stuff you’re clueless about without anyone way to check, you’re going to have a bad time, but the fact they can’t (so far) do that doesn’t make them worthless.

prophesi · 2025-03-28T14:02:56 1743170576

Because then I can know whether the hallucinations they encountered are a little surprising, or not surprising at all.

casey2 · 2025-03-28T13:57:03 1743170223

Because it's the difference between a fleshy hallucination and something that might related to reality.

jodrellblank · 2025-03-28T15:59:18 1743177558

> Why do you think these details are important?

It's https://en.wikipedia.org/wiki/Sealioning

eric_cc · 2025-03-28T15:59:50 1743177590

Sounds like a user problem, though. When used properly as a tool they are incredible. When you give up 100% trust to them to be perfect it’s you that is making the mistake.

jabroni_salad · 2025-03-28T15:45:53 1743176753

Well yeah, it's fancy autocomplete. And it's extremely amazing what 'fancy autocomplete' is able to do, but making the decision to use an LLM for the type of project you described is effectively just magical thinking. That isn't an indictment against LLM, but rather the person who chose the wrong tool for the job.

KoolKat23 · 2025-03-28T13:32:38 1743168758

This is more a lack of understanding of it's limitations, it'd be different if they asked for it to write a python script to collate the data.

xigoi · 2025-03-28T18:20:44 1743186044

If the LLM is intelligent, why can’t it figure out that writing a script would be the best way to solve the problem?

DarmokJalad1701 · 2025-03-28T20:50:43 1743195043

Some of the more modern tools do exactly that. If you upload a CSV to Claude, it will not (or at least not anymore) try to process the whole thing. It will read the header, and then ask you what you want. It will then write the appropriate Javascript code and run it to process the data and figure out the stats/whatever you asked it for.

I recently did this with a (pretty large) exported CSV of calories/exercise data from MyFitnessPal and asked it to evaluate it against my goals/past bloodwork etc (which I have in a "Claude Project" so that it has access to all that information + info I had it condense and add to the project context from previous convos).

It wrote a script to extract out extremely relevant metrics (like ratio of macronutrients on a daily basis for example), then ran it and proceeded to talk about the result, correlating it with past context.

Use the tools properly and you will get the desired results.

simonw · 2025-03-29T11:22:27 1743247347

ChatGPT has been able to do exactly that (using its Code Interpreter tool) for two years now. Gemini and Claude have similar features.

KoolKat23 · 2025-03-28T20:59:38 1743195578

Often they will do exactly that, currently their reasoning isn't the best so you may have to coax it to take the best path. It's also making a judgement call in its writing the code so worth checking too. No different to a senior instructing an intern.

pfdietz · 2025-03-28T13:39:44 1743169184

Ah, it's like communism, then (to its diehards). It cannot fail, it can only be failed.

KoolKat23 · 2025-03-28T13:42:55 1743169375

Please explain how what I am saying is wrong?

Zamaamiro · 2025-03-28T17:43:31 1743183811

This is an odd non-sequitur.

freilanzer · 2025-03-28T15:23:22 1743175402

So they used the model as a database? It should be immediately obvious to anyone that this won't work.

w0m · 2025-03-28T13:29:03 1743168543

"an old poorly implemented model can't do item X well therefore the technology is garbage"

Likely the most accurate measure of progress would be watching detractors goalposts move over time.

jodrellblank · 2025-03-28T15:56:37 1743177397

"Even a journey of 1,000 miles begins with the first step. Unless you're an AI hyper then taking the first step is the entire journey - how dare you move the goalposts"

Terretta · 2025-03-28T12:07:36 1743163656

"They continue to fabricate links, references, and quotes, like they did from day one." - "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error."

Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.

There are no fabricated links, references, or quotes, in OpenAI's GPT 4.5 + Deep Research.

It's unfortunate the cost of a Deep Research bespoke white paper is so high. That mode is phenomenal for pre-work domain research. You get an analyst's two week writeup in under 20 minutes, for the low cost of $200/month (though I've seen estimates that white paper cost OpenAI over USD 3000 to produce for you, which explains the monthly limits).

You still need to be a domain expert to make use of this, just as you need to be to make use of an analyst. Both the analyst and Deep Research can generate flawed writeups with similar misunderstandings: mis-synthesizing, misapplication, or missing inclusion of some essential.

Neither analyst nor LLM is a substitute for mastery.

fridder · 2025-03-28T14:17:20 1743171440

While I agree, it doesn't stop business folks pushing for its use in area where it is inappropriate. That is, at least for me, part of the skepticism.

blactuary · 2025-03-28T14:48:14 1743173294

How do people in the future become domain experts capable of properly making use of it if they are not the analyst spending two weeks on the write-up today?

never_inline · 2025-03-28T17:54:42 1743184482

My complaints with Deep Research LLMs is they don't go deeper than 2 pages of SERPs. I want them to dig down obscure stuff, not list cursorily relevant peripheral directions. they just seem to do breadth first than depth first search.

waffletower · 2025-03-28T16:26:45 1743179205

This assessment is incomplete. Large languages models are both less and more than these traditional tools. They have not subsumed them and all can sit together in separate tabs of a single browser window. They are another resource, and when the conditions are right, which is often the case in my experience, they are a startlingly effective tool for navigating the information landscape. The criticism of Gemini is a fair one, and I encountered it yesterday, but perhaps with 50% less entitlement. But Gemini also helped me translate obscure termios APIs to python from C source code I provided. The equivalent using search and/or Stack Overflow would have required multiple piecemeal searches without guarantees -- and definitely would have taken much more time.

casey2 · 2025-03-28T13:53:41 1743170021

The 404 links are hilarious, like you can't even parse the output and retry until it returns a link that doesn't 404? Even ignoring the billions in valuation, this is so bad for a $20 sub.

eric_cc · 2025-03-28T15:58:45 1743177525

The tweeters complaints sound like a user problem. LLM’s are tools. How you use them, when you use them, and what you expect out of them should be based on the fact they are tools.

whamlastxmas · 2025-03-28T13:31:26 1743168686

I’m sorry but the experience of coding with an LLM is about ten billion times better than googling and stack overflowing every single problem I come across. I’ve stack overflowed maybe like two things in the past half year and I’m so glad to not have to routinely use what is now a very broken search engine and web ecosystem.

player1234 · 2025-03-28T14:44:20 1743173060

How did you measure and compare googling/stack overflow to coding with an LLM? How did you get to the very impressive number ten billion times better?! Can you share your methodology? How have you defined better?

whamlastxmas · 2025-03-29T14:45:39 1743259539

I take calipers to my boss’s forehead veins and see how pissed he is routinely throughout the day

quonn · 2025-03-28T14:26:36 1743171996

It‘s broken now. It was fine 5 years ago.

blactuary · 2025-03-28T14:50:05 1743173405

The search ecosystem is broken now because google is focused on LLMs

internet101010 · 2025-03-28T15:21:05 1743175265

That's part of it. The other part is Google sacrificing product quality for excessive monetization. An example would be YouTube search - first three results are relevant, next 12 results are irrelevant "people also watched", then back to relevant results. Another example would be searching for an item to buy and getting relevant results in the images tab of google, but not the shopping tab.

whamlastxmas · 2025-03-29T14:44:38 1743259478

It’s broken bc google has spent 20+ years promoting garbage content in a self-serving way. No one was able to compete unless they played by googles rules, and so all we have left is blog spam and regular spam

vonneumannstan · 2025-03-28T14:37:19 1743172639

[flagged]

zehaeva · 2025-03-28T14:54:43 1743173683

I thought summarizing papers/stories/emails/meetings was one of the touted use cases of LLMs?

What are the use cases where the expected performance is high?

vonneumannstan · 2025-03-28T15:19:59 1743175199

I didn't notice that example. I doubt top tier models have issues with that. I was more referencing Sabines mentions of hallucinating citations and papers which is an issue I also had 2 years ago but is probably solved by Deep Research at this point. She just has massive skill issues and doesn't know what shes doing.

>What are the use cases where the expected performance is high?

https://openai.com/index/introducing-chatgpt-pro/

o1-pro is probably at top tier human level performance on most small coding tasks and definitely at answering STEM questions. o3 is even better but not released outside of it powering Deep Research.

https://codeforces.com/blog/entry/137543 o3 is top 200 on Codeforces for example.

giantrobot · 2025-03-28T14:53:06 1743173586

> This is just not a use case where the expected performance on these tasks is high.

Yet the hucksters hyping AI are falling all over themselves saying AI can do all this stuff. This is where the centi-billion dollar valuations are coming from. It's been years and these super hyped AIs still suck at basic tasks.

When pre-AI shit Google gave wrong answers it at least linked to the source of the wrong answers. LLMs just output something that looks like a link and calls it a day.

vonneumannstan · 2025-03-28T14:55:57 1743173757

To be fair the newest tools like Deep Research are actually quite good and hallucination is essentially not a real problem for them.

https://marginalrevolution.com/marginalrevolution/2025/02/de...

frm88 · 2025-03-28T17:14:41 1743182081

<<After glowing reviews, I spent $200 to try it out for my research. It hallucinated 8 of 10 references on a couple of different engineeribg topics. For topics that are well established (literature search), it is useful, although o3-mini-high with web search worked even better for me. For truly frontier stuff, it is still a waste of time.>>

<<I've had the hallucination problem too, which renders it less than useful on any complex research project as far as I'm concerned.>>

These quotes are from the link you posted. There are a lot more.

vonneumannstan · 2025-03-28T17:27:29 1743182849

I think Sabine is just wrong in this case. I don't think Deep Research can even hallucinate links in this way at all.

agentcoops · 2025-03-28T11:57:47 1743163067

The whole point is that an LLM is not a search engine and obviously anyone who treats it as one is going to be unsatisfied. It's just not a sensible comparison. You should compare working with an LLM to working with an old "state of the art" language tool like Python NLTK -- or, indeed, specifying a problem in Python versus specifying it in the form of a prompt -- to understand the unbridgeable gap between what we have today and what seemed to be the best even a few years ago. I understand when a popular science author or my relatives haven't understood this several years after mass access to LLMs, but I admit to being surprised when software developers have not.

Hosted and free or subscription-based DeepResearch like tools that integrate LLMs with search functionality (the whole domain of "RAG" or "Retrieval Augmented Generation") will be elementary for a long time yet simply because the cost of the average query starts to go up exponentially and there isn't that much money in it yet. Many people have and will continue to build their own research tools where they can determine how much compute time and API access cost they're willing to spend on a given query. OCR remains a hard problem, let alone appropriately chunking potentially hundreds of long documents into context length and synthesizing the outputs of potentially thousands of LLM outputs into a single response.

throwawaymaths · 2025-03-28T14:46:33 1743173193

to be fair a few? one? years ago LLMs were touted? marketed? as a "search killer", and a lot of people do use it in that fashion.

joquarky · 2025-03-28T16:26:02 1743179162

A lot of people need to improve their critical thinking skills to deconstruct the marketing hype, and then choose the right tool for the job.

throwawaymaths · 2025-03-28T16:55:11 1743180911

sure. isn't that effectively what Sabine is doing though? She just doesn't have as compelling a use in the cases where LLMs are strong.

agentcoops · 2025-03-28T19:20:24 1743189624

Certainly. I agree of course as to the problem of hype and I'm aware of how many people use LLMs today. I tried to emphasize in my earlier post that I can understand why someone like Sabine has the opinion she does -- I'm more confused how there's still similar positions to be found among software developers, evidenced often within Hacker News threads like the one we're in. I don't intend that to refer to you, who clearly has more than a passing knowledge of LLM internals, but more to the original commenter I was responding to.

More than marketing, I think from my experience it's chat with little control over context as the primary interface of most non-engineers with LLMs that leads to (mis)expectations of the tool in front of them. Having so little control over what is actually being input to the model makes it difficult to learn to treat a prompt as something more like a program.

WhyOhWhyQ · 2025-03-27T20:37:54 1743107874

From my perspective it's almost exactly opposite. Almost all of the people I consider exceptionally talented are vying for positions in academia (I'm in mathematics), and the people who don't make it begrudgingly accept jobs at the software houses / research labs.

I'm frequently and sadly reminded when I visit this website that lot of (smart) people can't seem imagine any form of success that doesn't include common social praise and monetary gain.

WhyOhWhyQ · 2025-03-26T07:31:33 1742974293

The answer is pretty damn sad.

> Come and work with me on Anthropic's Frontier Red Team

Twice as sad.

WhyOhWhyQ · 2025-03-23T13:30:22 1742736622

It's not an "appeal to authority" to report on the opinions of tech leaders. The only way it would make sense to call this an appeal to authority is if you think TechCrunch is making an argument for a side. This just looks like ordinary reporting, however.

roenxi · 2025-03-23T13:31:53 1742736713

But it is close to one to look to him as an authority on where AI is going. He's technically only commenting on exactly what IBM is going right now; which isn't where we expect the interesting things to happen.