Yeah. I just need to babysit it too much. Take copilot, it gives good suggestion...

edm0nd · 2025-08-05T15:23:32 1754407412

>I'd expect it to work like a very junior programmer, but it works like a very drunk senior programmer that isn't listening to you very well at all.

This seems to be the current consensus.

A very similar quote from another recent AI article:

One host compares AI chatbots to “a very smart assistant who has a dozen Ph.D.s but is also high on ketamine like 30 percent of the time.”

https://lithub.com/what-happened-when-i-tried-to-replace-mys...

__loam · 2025-08-05T15:33:17 1754407997

Even saying it has a dozen PhDs belies the reality that these things have no relationship with the truth

oceanplexian · 2025-08-05T15:56:51 1754409411

I find statements like this kind of funny.

If an AI assistant was the equivalent of “a dozen PhDs” at any of the places I’ve worked you would see an 80-95% productivity reduction by using it.

fuzztester · 2025-08-06T00:36:40 1754440600

>you would see an 80-95% productivity reduction by using it.

they are the equivalent.

there is already an 80-95% productivity reduction by just reading about them on Hacker News.

SpaceNoodled · 2025-08-05T17:00:16 1754413216

Yeah, we're only seeing a 20% reduction in productivity.

throwawaymaths · 2025-08-05T17:03:20 1754413400

yes, you are overestimating phds.

briangriffinfan · 2025-08-05T15:48:09 1754408889

Yes yes yes we're all aware that these are word predictors and don't actually know anything or reason. But these random dice are somehow able to give reasonably seemingly well-educated answers a majority of the time and the fact that these programs don't technically know anything isn't going to slow the train down any.

throwawaymaths · 2025-08-05T17:08:36 1754413716

i just don't get why people say they don't reason. It's crazy talk. the kv cache is effectively a unidirectional turing machine so it should be possible to encode "reasoning" in there. and evidence shows that llms occasionally does some light reasoning. just because it's not great at it (hard to train for i suppose) doesn't mean it does it zero.

briangriffinfan · 2025-08-06T10:11:42 1754475102

Would I be crazy to say that the difference between reasoning and computation is sentience? This is an impulse with no justification but it rings true to me.

DennisP · 2025-08-06T13:34:43 1754487283

Taking a pragmatic approach, I would say that if the AI accomplishes something that, for humans, requires reasoning, then we should say that the AI is reasoning. That way we can have rational discussions about what the AI can actually do, without diverting into endless discussions about philosophy.

mattkrause · 2025-08-06T14:53:31 1754492011

Eh...

Suppose A solves a problem and writes the solution down. B reads the answer and repeats it. Is B reasoning, when asked the same question? What about one that sounds similar?

DennisP · 2025-08-11T01:02:07 1754874127

If a human does A, that required reasoning. Same for AI.

If a human does B, that didn't require reasoning. Same for AI.

Believe it or not, people do make an effort to test their AIs on problems that they could not have seen in their training data.

briangriffinfan · 2025-08-11T10:05:16 1754906716

The crux of the problem is "what is reasoning?" Of course it's easy enough to call the outputs "equivalent enough" and then use that to say the processes are therefore also "equivalent enough."

DennisP · 2025-08-13T00:51:57 1755046317

I'm not saying it's enough for the outputs to be "equivalent enough."

I am saying that if the outputs and inputs are equivalent, then that's enough to call it the same thing. It might be different internally, but that doesn't really matter for practical purposes.

briangriffinfan · 2025-08-18T18:53:02 1755543182

I think one of the great lessons of our age will be that things being apparently equivalent, or in more applied terms "good enough," are not equal to equality.

briangriffinfan · 2025-08-07T12:03:57 1754568237

Could it be said that reasoning requires intent?

throwawaymaths · 2025-08-06T12:25:47 1754483147

fine. prove to me llms aren't sentient. your proof can't just be "vibes"

briangriffinfan · 2025-08-06T12:42:43 1754484163

See: "This is an impulse with no justification." In that sense yes my justification absolutely can be vibes, and it is! Suck it!

throwawaymaths · 2025-08-06T13:34:04 1754487244

i see we are in agreement

SpaceNoodled · 2025-08-05T17:01:03 1754413263

"Majority" may be a bit generous, and would highly depend on the context and application.

sellmesoap · 2025-08-05T21:33:54 1754429634

To be fair I've known 10x developers who are high on ketamine 100 percent of the time, it boggles my mind that this can work.

deterministic · 2025-08-07T04:42:07 1754541727

In my experience PhD's are not 10x productive. Quite the opposite actually. Too much theory and not much practicality. The only two developers that my company has fired for (basically) incompetency were PhD's in Computer Science. They couldn't deliver practical real code.

fuzztester · 2025-08-06T00:30:54 1754440254

>high on ketamine

https://en.m.wikipedia.org/wiki/Ketamine

Because of its hallucinogenic properties?

cess11 · 2025-08-06T09:10:44 1754471444

"Ketamine has been found to increase dopaminergic neurotransmission in the brain"

This property is likely an important driver of ketamine abuse and it being rather strongly 'moreish', as well as the subjective experiences of strong expectation during a 'trip'. I.e. the tendency to develop redose loops approaching unconsciousness in a chase to 'get the message from the goddess' or whatever, which seems just out of reach (because it's actually a feeling of expectation and not actually a partially installed divine T3 rig).

galangalalgol · 2025-08-06T02:03:42 1754445822

Because people like getting high and look for justifications to do so.

fuzztester · 2025-08-06T03:47:04 1754452024

Left, right or center justification, it's all the same.

But you're right.

pbronez · 2025-08-05T23:46:52 1754437612

The “multiple PhDs” thing is interesting. The point of a PhD is to master both a very specific subject and the research skills needed to advance the frontier of knowledge in that area. There’s also plenty of secondary issues, like figuring out the politics of academia and publishing enough to establish a reputation.

I don’t think models are doing that. They certainly can retrieve a huge amount of information that would otherwise only be available to specialists such as people with PhDs… but I’m not convinced the models have the same level of understanding as a human PhD.

It’s easy to test though- the models simply have to write and defend a dissertation!

To my knowledge, this has not yet been done.

Uehreka · 2025-08-05T16:15:53 1754410553

Totally disagree. The current state of coding AIs is “a level 2 product manager who is a world class biker balancing on a unicycle trying to explain a concept in French to a Spanish genius who is only 4 years old.” I’m not going to explain what I mean, but if you’ve used Qwen Code you understand.

jcelerier · 2025-08-05T16:34:39 1754411679

Qwen Code is really not representative of the state of the art though. With the right prompt I have no problem getting Claude to output me a complete codebase (e.g. a non trivial library interfacing with multiple hardware devices) with the specs I want, in modern c++ that builds, runs, has documentation and unit tests sourced from data sheets and manufacturer specs from the go

PaulHoule · 2025-08-05T20:16:27 1754424987

Assuming there aren't tricky concurrency issues and the documentation makes sense (you know what registers to set to configure and otherwise work the device,) device drivers are the easiest thing in the world to code.

There's the old trope that systems programmers are smarter than applications programmers but SWE-Bench puts the lie to that. Sure, SWE-Bench problems are all in the language of software, applications programmers take badly specified tickets in the language of product managers, testers and end users and have to turn that into the language of SWE-Bench to get things done. I am not that impressed with 65% performance on SWE-Bench because those are not the kind of tickets that I have to resolve at work, but rather at work if I want to use AI to help maintain a large codebase I need to break the work down into that kind of ticket.

com2kid · 2025-08-06T05:06:15 1754456775

> device drivers are the easiest thing in the world to code.

Except the documentation lies and in reality your vendor shipped you a part with timing that is slightly out of sync with what the doc says and after 3 months of debugging, including using an oscilloscope, you figure out WTF is going on. You report back to your supplier and after two weeks of them not saying any thing they finally reply that the timings you have reverse engineered are indeed the correct timings, sorry for any misunderstandings with the documentation.

As an application's engineer, my computer doesn't lie to me and memory generally stays at a value I set it to unless I did something really wrong.

Backend services are the easiest thing in the world to write, I am 90% sure that all the bullshit around infra is just artificial job security, and I say this as someone who primarily does backend work now days.

astrange · 2025-08-06T07:30:46 1754465446

I'm not sure if this counts as systems or application engineering, but if you think your computer doesn't lie to you, try writing an nginx config. Those things aren't evaluated at /all/ the way they look like they are.

com2kid · 2025-08-06T18:53:53 1754506433

At no point have any of my nginx files ever flipped their own bits.

Are they a constant source of low level annoyance? Sure. But I've never had to look at a bus timing diagram to understand how to use one, nor worried about an nginx file being rotated 90 degrees and wired up wrong!

rcxdude · 2025-08-05T22:33:20 1754433200

To some extent, for sure. The fact that electronics engineers that have picked up a bit of software write a large fraction of the world's device drivers does point to it not being the most challenging of software tasks, but on the other hand the real 'systems engineering' is writing the code that lets those engineers do so successfully, which I think is quite an impressive feat.

Uehreka · 2025-08-05T16:59:01 1754413141

I was joking! Claude Code is still the best afaik, though I’d compare it more to “sending a 1440p HDR fax of your user story to a 4-armed mime whose mind is then read by a Aztec psychic who has taken just the right amount of NyQuil.”

Hammershaft · 2025-08-05T18:52:37 1754419957

That exceeds my expectations! I'm willing to change my mind, do you have any cool examples i should look at?

jg0r3 · 2025-08-05T20:51:05 1754427065

That also wildly exceeds my experience. The documentation + code generated would be enlightening!

v3xro · 2025-08-05T16:56:08 1754412968

Probably the saddest comment I've read all day. Crafting software line-by-line is the best part of programming (maybe when dealing with hardware devices you can instead rely on auto-generated code from the register/memory region descriptions).

nopinsight · 2025-08-05T18:09:54 1754417394

How long would that be economically viable when a sufficient number of people can generate high-qualify code in 1/10th the time? (Obviously, it will always be possible as a hobby.)

esseph · 2025-08-06T01:16:48 1754443008

I think eventually the move to "coding with AI" may be like the jump from coding in low level to higher level languages was.

mullingitover · 2025-08-05T16:06:51 1754410011

> But actively letting it code (at least with gpt4.1 or gpt4o)

It's funny, Github Copilot puts these models in the 'bargin bin' (they are free in 'ask' mode, whereas the other models count against your monthly limit of premium requests) and it's pretty clear why, they seem downright nerfed. They're tolerable for basic questions but you wouldn't use them if price weren't a concern.

Brandwise, I don't think it does OpenAI any favors to have their models be priced as 'worthless' compared to the other models on premium request limits.

ewoodrich · 2025-08-05T17:13:24 1754414004

Shhh... the free GPT 4.1 exposed to the VS Code LM API is the only reason I still pay for GitHub Copilot.

docmars · 2025-08-05T17:12:12 1754413932

With something like Devin, where it integrates directly with your repo and generates documentation based on your project(s), it's much more productive to use as an agent. I can delegate like 4-5 small tasks that would normally take me a full day or two (or three) of context switching and mental preparation, and knock them out in less than a day because it did 50-80% of the work, leaving only a few fixes or small pivot for me to wrap them up.

This alone is where I get a lot of my value. Otherwise, I'm using Cursor to actively solve smaller problems in whatever files I'm currently focused on. Being able to refactor things with only a couple sentences is remarkably fast.

The more you know about your language's features (and their precise names), and about higher-level programming patterns, the better time you'll have with LLMs, because it matches up with real documentation and examples with more precision.

spopejoy · 2025-08-05T20:07:33 1754424453

> Being able to refactor things with only a couple sentences is remarkably fast.

I'm curious, this is js/ts? Asking because depending on the lang, good old machine refactoring is either amazeballs (Java + IDE) or non-existent (Haskell).

I'm not js/ts so I don't know what the state of machine refactoring is in VS code ... But if it's as good as Java then "a couple of sentences" is quite slow compared to a keystroke or a quick dialog box with completion of symbol names.

docmars · 2025-08-06T00:44:42 1754441082

I'm using TypeScript. In my case, these refactors are usually small and only spanning up to 5 files depending on how interdependent things are. The benefit with an Agent is it's ability to find and detect related side effects caused by the refactor (broken type-safety, broken translation strings, etc.) and renaming for related things, like an actual UI string if it's tied to the naming of what I'm working on, and my changes happened to include a rename.

It's not always right, but I find it helpful when it finds related changes that I should be making anyway, but may have overlooked.

Another example: selecting a block that I need to wrap (or unwrap) with tedious syntax, say I need to memoize a value with a React `useMemo` hook. I can select the value, open Quick Chat, type "memoize this", and within milliseconds it's correctly wrapped and saved me lots of fiddling on the keyboard. Scale this to hundreds of changes like these over a week, it adds up to valuable time-savings.

Even more powerful: selecting 5, 10, 20 separate values and typing: "memoize all of these" and watching it blast through each one in record time with pinpoint accuracy.

TsiCClawOfLight · 2025-08-06T20:00:16 1754510416

IntelliJ has keyboard shortcuts for all of these. I think how impressed you are by AI depends a lot on the quality of the tooling you were previously working with.

skinnymuch · 2025-08-05T17:59:39 1754416779

Is work paying for Devin or you are? How pricey is it to delegate the task example you gave?

docmars · 2025-08-06T00:49:05 1754441345

Work is. I actually don't have access to our billing, so I couldn't tell you exactly, but it depends on how many ACUs (Agent Compute Units) you've used.

We use a Team plan ($500 /mo), which includes 250 ACUs per month. Each bug or small task consumes anywhere between 1-3 ACUs, and fewer units are consumed if you're more precise with your prompt upfront. A larger prompt will usually use fewer ACUs because follow-up prompts cause Devin to run more checks to validate its work. Since it can run scripts, compilers, linters, etc. in its own VM -- all of that contributes to usage. It can also run E2E tests in a browser instance, and validate UI changes visually.

They recommend most tasks should stay under 5 ACUs before it becomes inefficient. I've managed to give it some fairly complex tasks while staying under that threshold.

So anywhere between $2-6 per task usually.

platevoltage · 2025-08-05T19:06:48 1754420808

>I'd expect it to work like a very junior programmer, but it works like a very drunk senior programmer that isn't listening to you very well at all.

Best analogy I've ever heard and it's completely accurate. Now, back to work debugging and finishing a vibe coded application I'm being paid to work on.

hnuser123456 · 2025-08-05T16:10:30 1754410230

I think there are three factors to this: 1. What to code (longer, more specific prompts are better but take longer to write), and 2. How to code it (specify languages, libraries, APIs, etc.) And if you're trying to write code that uses a newer version of a library that works differently from what's most commonly documented, it's a long uphill battle of constantly reminding the LLM of the new changes.

If you're not specific enough, it will definitely spit out a half-baked pseudocode file where it expects you to fill in the rest. If you don't specify certain libraries, it'll use whatever is featured in the most blogspam. And if you're in an ecosystem that isn't publicly well-documented, it's near useless.

mikepurvis · 2025-08-06T03:09:11 1754449751

Two other observations I've found working with ChatGPT and Copilot:

First, until I can re-learn boundaries, they are a fiasco for work-life balance. It's way too easy to have a "hmm what if X" thought late at night or first thing in the morning, pop off a quick ticket from my phone, assign to Copilot, and then twenty minutes later I'm lying in bed reviewing a PR instead of having a shower, a proper breakfast, and fully entering into work headspace.

And on a similar thread, Copilot's willingness to tolerate infinite bikeshedding and refactoring is a hazard for actually getting stuff merged. Unlike a human colleague who loses patience after a round or two of review, Copilot is happy to keep changing things up and endlessly iterating on minutiae. Copilot code reviews are exhausting to read through because it's just so much text, so much back and forth, every little change with big explanations, acknowledgments, replies, etc.

astrange · 2025-08-06T07:32:37 1754465557

I've found this with Claude Code too. It has nonstop energy (until you run out of tokens) and is always a little too eager to make random edits, which means it's somehow very tiring to use even though you're not doing anything.

But it is the most productive intern I've ever pair programmed with. The real ones hallucinate about as often too.

nudgeOrnurture · 2025-08-05T22:14:52 1754432092

complete coding noob here.

if I want to throw a shuriken abiding to some artificial, magic Magnus force like in the movie wanted, both chatGpt and Claude let me down, using pygame. what if I wanted c-level performance or if I wanted to use zig? burp.

It works like the average Microsoft employee, like some doped version of an orange wig wearer who gets votes because his daddys kept the population as dumb as it gets after the dotcom x Facebook era. in essence, the ones to be disappointed by are the Chan-Zuckerbergs of our time. there was a chance, but there also was what they were primed for

bcrosby95 · 2025-08-05T15:46:44 1754408804

It codes like a junior, has the design sense of a mid, while being a savant at algorithms.

oriolid · 2025-08-05T15:53:40 1754409220

Make it idiot at algorithms and I believe you.

morpheos137 · 2025-08-05T16:01:43 1754409703

What does it really mean to know something or understand something. I think AI knows a great deal (associating facts with symbols), confabulates at times when it doesn't know (which is dishonestly called hallucination, implying a conscious agent misperceiving, which AI is not), and understands almost nothing.

The best way to think of chat bot "AI" is as the compendium of human intelligence as recorded in books and online media available to it. It is not intelligent at all on its own and its judgement can't be better than its human sources because it has no biological drive to sythesize and excel. Its best to think of AI as a librarian of human knowledge or an interactive Wikipedia which is designed to seem like an intelligent agent but is actually not.

t43562 · 2025-08-06T10:50:54 1754477454

One cannot learn everything from books and in any case many books contradict each other so every developer is a variation based on what they have read and experienced and thought along the way. How can that get summed up into one thing? It might not even be useful to do that.

I suspect that some researchers with a very different approach will come up with a neural network that learns and works more like a human in future though. Not the current LLMS but something with a much more efficient learning mechanism that doesn't require a nuclear power station to train.

morpheos137 · 2025-08-07T00:31:28 1754526688

What is baffling to me is how otherwise intelligent people don't really understand what human intelligence and learning are about. They are about a biological organism following its replication algorithm. Why should a computer program learn and work like a biological organism if it is in an entirely different environment with entirely different drives?

Intelligence is not some universal abstract thing acheivable after a certain computational threshold is reached. Rather its a quality of the behavior patterns of specific biological organisms following their drives.

t43562 · 2025-08-08T10:26:33 1754648793

...because so far only our attempts to copy nature have proven successful...in that we have judged the result "intelligent".

There's a long history in AI where neural nets were written off as useless (Minsky was the famous destroyer of the idea, I think) and yet in the end they blew away the alternatives completely.

We have something now that's useful in that it is able to glom a huge amount of knowledge but the cost of doing so it tremendous and therefore in many ways it's still ridiculously inferior to nature because it's only a partial copy.

A lot of science fiction has assumed that robots, for example, would automatically be superior to humans - but are robots self-repairing or self replicating? I was reading recently about how the reasons why many developers like python are the reasons why it can never be made fast. In other words you cannot have everything - all features come at a cost. We will probably have less human and more human AIs because they will offer us different trade offs.

ffsm8 · 2025-08-05T15:21:06 1754407266

To date, I've not been able to effectively use Copilot in any projects.

The suggestions were always unusably bad. The /fix were always obviously and straight up false unless it was a super silly issue.

Claude Code with Opus model on the other hand was mind-blowing to me and made me change my mind on almost everything wrt my opinion of LLMs for coding.

You still need to grow the skill of how to build the context and formulate the prompt, but the buildin execution loop is a complete game changer and I didn't realize that until I actually used it effectively on a toy project myself.

MCP in particular was another thing I always thought was massively over hyped, until I actually started to use some in the same toy project.

Frankly, the building blocks already exist at this point to make a vast majority of all jobs redundant (and I'm thinking about all grunt work office jobs, not coding in particular). The tooling still need to be created, so I'm not seeing a short term realization (<2 yrs), but medium term (5+yrs)?

You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents

simonw · 2025-08-05T15:45:56 1754408756

> You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents

I don't buy that. The linked article makes a solid argument for why that's not likely to happen: agentic loop coding tools like Claude Code can speed up the "writing code and getting it working" piece, but the software development lifecycle has so much other work before you get to the "and now we let Claude Code go brrrrrrr" phase.

ffsm8 · 2025-08-05T16:06:00 1754409960

And I completely agree with that!

These are exactly the people that are going to stay, medium term.

Let's explore a fictional example that somewhat resembles my, and I suspect a lot of peoples current dayjob.

A Micro-Service architecture, each team administers 5-10 services and the whole application, which is once again only a small part of the platform as a whole is developed by maybe 100-200 devs. So something like ~200 micro services

The application architects are gonna be completely save in their jobs. And so are the lead devs in each team - at least from my perspective. Anyone else? I suspect MBAs in 5 yrs will not see their value anymore. That's gonna be the vast majority of all devs, that's likely going to cost 50% of the devs their jobs. And middle management will be slimmed down just as quickly, because you suddenly need a lot less managers.

alfiedotwtf · 2025-08-05T19:44:36 1754423076

Let’s extreme this further - why would the company exist in the first place? The customers of said company pay them because they don’t do the service themselves - but in the future when it’s laughably easy to vibe code anything your heart desires, their customers will just build the service themselves that they used to outsource!

tl;dr: in the future when vibe coding works 100% of the time, logically the only companies that will exist are the ones that have processes that AI can’t do, because all the other parts of the supply chain can all be done in-house

ffsm8 · 2025-08-05T19:55:12 1754423712

That scenario is a lot further out compared to what I was talking about.

It's conceivable that thats going to happen, eventually. but that'd likely require models a lot more advanced to what we have now.

The agent approach with lead devs administering and merging the code the agents made is feasible with today's models. The missing part is the tooling around the models and the development practices that that standardizes this workflow.

That's what I'd expect to take around 5 yrs to settle.

necovek · 2025-08-05T15:27:29 1754407649

Thanks for this perspective, but I am a bit confused by some of your takes: you used "Claude Code with Opus model" in "the same toy project" with great success, which led you to conclude that this will "make a vast majority of all jobs redundant".

Toy project viability does not connect with making people redundant in the process (ever, really) — at least not for me. Care to elaborate where do you draw the optimism from?

ffsm8 · 2025-08-05T15:40:51 1754408451

I cannot use it on my production code base. I'm working for a company that requires the devs to code from virtual workplaces, which is a fancy term to say virtual machines running in the azure cloud. These are completely locked down and anything but copilot is forbidden from use, and enforced via firewall and process monitoring. I can still use sonnet 3.7 through that, but that's a far cry from my experience on my personal time with Claude Code.

I called it a toy project because I'm not earning money with it - hence it's a toy.

It does have medium complexity with roughly 100k loc though.

And I think I need to repeat myself, because you seem to read something into my comment that I didn't say: the building blocks exist doesn't mean that today's tooling is sufficient for this to play out, today.

I very explicitly set a time horizon of 5 yrs.

necovek · 2025-08-06T19:10:55 1754507455

I did not miss the time horizon: this is why I put a remark of "ever, really".

"Toy project" is usually used in a different context (demonstrate something without really doing something useful): yours sounds more like a "hobby project".

ffsm8 · 2025-08-08T16:13:13 1754669593

That's a good point. Ive actually implemented the same project over 20 times at this point.

At the heart is my hobby of reading web and light novels. I've been implementing various versions of a scraper and ePub reader for over 15 years now, ever since I started working as a programmer.

I've been reimplementing it over the years with the primary goal of growing my experiences/ability. In the beginning it was a plain Django app, but it grew from that to various languages such as elixir, Java (multiple times with different architecture approaches), native Android, JS/TS Frontend and sometimes backend - react, angular, trpc, svelte tanstack and more.

So I know exactly how to implement it, as I've give through a lot of version for the same functionality. And the last version I implemented (tanstack) was in July, via Claude Code and got to feature parity (and more) within roughly 3 weeks.

And I might add: I'm not positive about this development either, whatsoever. I'm just expecting this to happen, to the detriment of our collective futures (as programmers)

bluefirebrand · 2025-08-05T15:29:19 1754407759

> You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents

I'm gonna pivot to building bomb shelters maybe

Or stockpiling munitions to sell during the troubles

Maybe some kind of protest support saas. Molotov deliveries as a service, you still have to light them and throw them but I guarantee next day delivery and they will be ready to deploy into any data center you want to burn down

What Im trying to say is "companies letting people go in staggering numbers" is a societal failure state not an ideal

code_for_monkey · 2025-08-05T15:53:44 1754409224

I find it so weird how many engineers seem positively giddy to get replaced by a chatbot that functionally cannot do the job. Ill help your molotovs as a service startup, free guillotine with every 6th order.

leptons · 2025-08-05T18:39:58 1754419198

So what happens when someone calls in and the "AI" answers (because the receptionist has been fired and replaced by "AI"), and the caller asks to access some company record that should be private? Will the LLM always deny the request? Hint: no, not always.

There are so many flaws in your plan, I have no doubt that "AI" will ruin some companies that try to replace humans with a "tin can". LLMs are being inserted loosey-goosey into too many places by people that don't really understand the liability problems it creates. Because the LLM doesn't think, it doesn't have a job to protect, it doesn't have a family to feed. It can be gamed. It simply won't care.

The flaws in "AI" are already pretty obvious to anyone paying attention. It will only get more obvious the more LLMs get pushed into places they really do not belong.

vanviegen · 2025-08-06T06:44:01 1754462641

> Will the LLM always deny the request? Hint: no, not always.

And you are confident that the human receptionist will never fall for social engineering?

I don't think data protection is even close to the biggest problem with replacing all/most employees with bots.

leptons · 2025-08-06T16:48:00 1754498880

The human receptionist can use critical thinking, and self preservation to prevent a bad outcome. The LLM can not. When a person causes a problem, they can be fired, and learn from the event. The LLM will not learn from it. And who is responsible then? The company providing the LLM? The more LLM use becomes pervasive, the taller the house of cards gets.

samtp · 2025-08-05T19:56:59 1754423819

> until I actually started to use some in the same toy project

Thats the key right there. Try to use it in a project that handles PII, needs data to be exact, or has many dependencies/libraries and needs to not break for critical business functions.

__loam · 2025-08-05T15:34:57 1754408097

Who buys their crap if you fire everyone?