They won't blame the AI as the root cause. They will blame you.
This is why I mention you need to be competent enough to understand what is being generated or they will find someone else who does. There's no 2 ways around it. AI is here to stay.
Yes I agree. They will and should blame the human. That's a problem when the human isnt given enough time to complete projects because "AI is SO productive"
The idea that you will actually really review all that vibecoded slop code comes across as very naive.
The real question is how many companies have to accidentally expose their databases, suffer business-ruining data losses, and have downtime they are utterly unable to recover from quickly before CxOs start adjusting their opinions?
Last time I saw a "Show HN" of someone showing off their vibecoded project, it leaked their OpenAI API key to all users. If that's how you want to run your business, go right ahead.
> you need to be competent enough to understand what is being generated
We're all competent enough to understand what is generated. That's why everyone is doomer about it.
What insights do you have above us when the LLM generates
true="false"
while i < 10 {
i++
}
What's the deep philosophical understanding that you have about this that makes us all sheeple for not understanding how this is actually the business's goose laying golden eggs. Not the engineers.
Frankly. Businesses that use this, drop all their actual engineers, and then fall over when the slightest breeze comes.
I am actually in favour, in a accelerationist sense.
I appreciate the sentiment, but I've found this resource [0] much more direct and comprehensive. It explains all of the nuance regarding dB and related terminology from audio engineering to perception (voltage, power, intensity, volume, loudness, etc.)
The format is a bit circular; just enjoy getting lost in it for half an hour.
Thank you for saying this, I agree with your point exactly.
However, instead of using that known human bias to justify pervasive LLM use, which will scale and make everything worse, we either improve LLMs, improve humans, or some combo.
Your point is a good one, but the conclusion often taken from it is a shortcut selfish one biased toward just throwing up our hands and saying "haha humans suck too am I right?", instead of substantial discussion or effort toward actually improving the situation.
Nice writeup.
F1, balanced accuracy, etc. In truth it depends on your problem and what a practical "best" solution is, especially in imbalanced scenarios, but Matthews Correlation Coefficient (MCC) is probably the best comprehensive and balanced blind go-to metric, because it guarantees that more portions of the confusion matrix are good [0,1].
I made a quick interactive, graphical exploration to demonstrate this in python [2].
Really neat visualization! And thanks for the tip on MCC.
Out of curiosity I plugged it to the same visualization (performance vs. class weight when optimized with BCE) and it behaves similar to F1, i.e. best without weighting.
> this is aimed primarily at engineers who don't have ML expertise: someone who understands the business context, knows how to build data processing pipelines and web services, but might not know how to build the models.
I respect software engineers a lot, however ANYONE who "doesn't know how to build models" also doesn't know what data leakage is, how to evaluate a model more deeply than simple metrics/loss, and can easily trick themselves into building a "great" model that ends up falling on its face in prod. So apologies if I'm highly skeptical of the admittedly very very cool thing you have built. I'd love to hear your thoughts.
I think you're probably right. As an example of this challenge, I've noticed that engineers who don't have a background in ML often lack the "mental models" to understand how to think about testing ML models (i.e. statistical testing as opposed to the kind of pass/fail test cases that are used to test code).
The way I look at this is that plexe can be useful even if it doesn't solve this fundamental problem. When a team doesn't have ML expertise, their choices are A) don't use ML B) acquire ML expertise C) use ChatGPT as your predictor. Option C suffers of the same problem you mentioned, in addition to latency/scalability/cost and the model not being trained on your data etc. So something like Plexe could be an improvement on option C by at least addressing the latter pain points.
Plus: we can keep throwing more compute at the agentic model building process, doing more analysis, more planning, more evaluation, more testing, etc. It still won't solve the problem you bring up, but hopefully it gets us closer to the point of "good enough to not matter" :)
Education to the uneducated (or those who would prefer we remain uneducated in the face of power) can easily cast any education as "biased" against their purposes. Most people see through that for what it is, but an increasing population of Americans don't.
I never found it so, as going through "liberal" media I tend to see far fewer lies than I do when I wonder over to the world of mainstream conservative news.
The saddest part is watching friends and colleagues who i previously considered healthily skeptical and rigorous thinkers now "blown away" by promotional videos, drinking the kool aid. Yes it is amazing. No I don't believe everything it comes up with. Why would we?