Hacker Newsnew | past | comments | ask | show | jobs | submit | apercu's commentslogin

The KPI problem is systemic and bigger than just Gen-AI, it’s in everything these days. Actual governance starts by being explicit about business value.

If you can’t state what a thing is supposed to deliver (and how it will be measured) you don’t have a strategy, only a bunch of activity.

For some reason the last decade or so we have confused activity with productivity.

(and words/claims with company value - but that's another topic)


I was going to basically say the same thing. Few people can "sing" without spending time developing their ear.

I find it interesting that this thread is full of pragmatic posts that seem to honestly reflect the real limits of current Gen-Ai.

Versus other threads (here on HN, and especially on places like LinkedIn) where it's "I set up a pipeline and some agents and now I type two sentences and amazing technology comes out in 5 minutes that would have taken 3 devs 6 months to do".


I never see those type of posts. Maybe I'm immune and ignoring them.

I want that immunity. Gen-AI evangelism is the new virtue signalling.

I actually enjoy writing specifications. So much so that I made it a large part of my consulting work for a huge part of my career. SO it makes sense that working with Gen-AI that way is enjoyable for me.

The more detailed I am in breaking down chunks, the easier it is for me to verify and the more likely I am going to get output that isn't 30% wrong.


The issue is they often choose the wrong 1%.

> "The biggest issue I see is Microsoft's entire mentality around AI adoption that focuses more on "getting the numbers up" then actually delivering a product people want to use."

That succinctly describes 90% of the economy right now if you just change a word and remove a couple:

The biggest issue I see is the entire mentality that focuses more on "getting the numbers up" than actually delivering a product people want to use.


KPI infection. You see projects whose goal is, say "repos with A I code review turned on" vs "Code review suggestions that were accepted". And then if you do get adoption (like, say, a Claude Code trial), then VPs balk about price. If it's actually expensive now it's because they are actually using it all the time!

The same kind of logic that led companies to migrate from Slack to Teams. Metrics that don't actually look at actual, positive impact, as nobody picks a risky KPI, and will instead pick a useless one that can't miss.


My phone, laptop, TV, fridge, etc., all demonstrably worse than they were 5 years ago.

I agree with you but you can’t jail a gen-ai model, I guess, is where the difference lies?


"The company can be held vicariously liable" means that in this analogy, the company represents the human who used AI inappropriately, and the employee represents the AI model that did something it wasn't directly told to do.


Nobody tries to jail Microsoft Word, they jail the person using it.


Nobody tries to jail the automobile being driven when it hits a pedestrian when on cruise control. The driver is responsible for knowing the limits of the tool and adjusting accordingly.


Can you help me understand where you are coming from? Is it that you think the benchmark is flawed or overly harsh? Or that you interpret the tone as blaming AI for failing a task that is inherently tricky or poorly specified?

My takeaway was more "maybe AI coding assistants today aren’t yet good at this specific, realistic engineering task"....


In my experience many OTEL libraries are aweful to use and most of the "official" ones are the worst offenders as the are largely codegened. That typically makes them feel clunky to use and they exhibit code patterns that are non-native to the language used, which would an explanation of why AI systems struggle with the benchmark.

I think you would see similar results if tasking an AI to e.g. write GRPC/Protobuf systems using only the builtin/official protobuf codegen languages.

Where I think the benchmark is quite fair is in the solutions. It looks like for each of the languages (at least the ones I'm familiar with), the "better" options were chosen, e.g. using `tracing-opentelemtry` rather than `opentelemetry-sdk` directly in Rust.

However the one-shot nature of the benchmark also isn't that reflective of the actual utility. In my experience, if you have the initial framework setup done in your repo + a handful of examples, they do a great job of applying OTEL tracing to the majority of your project.


Where I work we are looking at a lot of our documentation and implementations where AI has a hard time when doing it.

This almost always correlates with customers having similar issues in getting things working.

This has lead us to rewrite a lot of documentation to be more consistent and clear. In addition we set out series of examples from simple to complex. This shows as less tickets later, and more complex implementations being setup by customers without the need for support.


I did similar for about 25 years. I had one injury from overtraining (I basically ran 20 miles every Sunday morning for 6 months, in addition to two shorter runs each week) that ended up plantar fasciitis and I had to take 4-5 month off.

I stopped doing that sort of weekly long run after that and did a lot more in the 6-10 miles range.

Then during and immediately post-COVID shutdowns, I just started running every time I felt stressed about something, and I started to neglect all the other holistic movements that complement running.

This ended up leading to a weird twinge in my hip that 2 years of focused strength training hasn't eliminated. Doctor says there is nothing structural but I don't run any more and I miss it often. There is a flow state I seem to get in somewhere just under to just over an hour in to a run.

The only other time I ever get in to that wonderful flow state is every once in a while when playing guitar, but it's rare.

I does feel good to run, and I miss it.


> weird twinge in my hip

Could be arthritis.


I’ve noticed the inverse as in the more I understand something, the less “simple” it looks.

Apparent simplicity usually comes from weak definitions and overconfident summaries, not from the underlying system being easy.

Complexity is often there from the start, we just don’t see it yet.


There's a great analog with this in chess as well.

~1200 - omg chess is so amazing and hard. this is great.

~1500 - i'm really starting to get it! i can beat most people i know easily. i love studying this complex game!

~1800 - this game really isn't that hard. i can beat most people at the club without trying. really I think the only thing separating me from Kasparov is just a lot of opening prep and study

~2300 - omg this game is so friggin hard. 2600s are on an entirely different plane, let alone a Kasparov or a Carlsen.

Magnus Carlsen - "Wow, I really have no understanding of chess." - Said without irony after playing some game and going over it with a computer on stream. A fairly frequent happening.


Funny how the start of your scale, 1200 Elo, is essentially what I have as a goal and am not even close yet, lol.


I think it's more of a curve from my point of view.

Beginner: I know nothing and this topic seems impossible to grasp.

Advanced beginner: I get it now. It's pretty simple.

Intermedite: Hmm, this thing is actually very complicated.

Expert: It's not that complicated. I can explain a simple core covering 80% of it. The other 20% is an ocean of complexity.


I suppose we'll just have to agree to disagree. Simplicity comes from strong definitions, and "infinite" complexity comes from weak ones.

If you're always chasing the next technicality then maybe you didn't really know what question you were looking to answer at the onset.


>If you're always chasing the next technicality

This sounds like someone who has never studied physics.

"Oh wow, I figured out everything about physics... except this one little weird thing here"

[A lifetime of chasing why that one little weird thing occurs]

"I know nothing about physics, I am but a mote in an endless void"

---

Strong or weak definitions don't save you here, what you are looking for is error bars and acceptable ranges.


Your response along with others is proving my point in an unfortunate way.

If you think I'm saying that the world is not infinitely complex, you are missing the point.


IMO both perspectives have their place. Sometimes what's missing is the information, sometimes what's lacking is the ability to communicate it and/or the willingness to understand it. So in different circumstances either viewpoint may be appropriate.

What's missing more often than not, across fields of study as well as levels of education, is the overall commitment to conceputal integrity. From this we observe people's habitual inability or unwillingness to be definite about what their words mean - and their consequent fear of abstraction.

If one is in the habit of using one's set of concepts in the manner of bludgeons, one will find many ways and many reasons to bludgeon another with them - such as if a person turned out to be using concepts as something more akin to clockwork.


Yes, we're in complete agreement about conceptual integrity.

Reality is such that, without integrity, you can prove almost anything you want. As long as your bar for "prove" is at the very bottom.


Simple counterexample: chess. The rules are simple enough we regularly teach them to young children. There's basically no randomness involved. And yet, the rules taken together form a game complex enough that no human alive can fully comprehend their consequences.


This is actually insightful: we usually don't know the question we are trying to answer. The idea that you can "just" find the right question is naive.


> Simplicity comes from strong definitions

Sure, you can put it this way, with the caveat that reality at large isn't strongly definable.

You can sort of see this with good engineering: half of it is strongly defining a system simple enough to be reasoned about and built up, the other half is making damn sure that the rest of reality can't intrude, violate your assumptions and ruin it all.


Wisdom comes from knowing what you don't know.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: