They are, the model has no inherent knowledge about its confidence levels, it just adds plausible-sounding numbers. Obviously they _can_ be plausible, but trusting these is just another level up from trusting the original output.
I read a comment here a few weeks back that LLMs always hallucinate, but we sometimes get lucky when the hallucinations match up with reality. I've been thinking about that a lot lately.
> the model has no inherent knowledge about its confidence levels
Kind of. See e.g. https://openreview.net/forum?id=mbu8EEnp3a, but I think it was established already a year ago that LLMs tend to have identifiable internal confidence signal; the challenge around the time of DeepSeek-R1 release was to, through training, connect that signal to tool use activation, so it does a search if it "feels unsure".
Wow, that's a really interesting paper. That's the kind of thing that makes me feel there's a lot more research to be done "around" LLMs and how they work, and that there's still a fair bit of improvement to be found.
In science, before LLMs, there's this saying: all models are wrong, some are useful. We model, say, gravity as 9.8m/s² on Earth, knowing full well that it doesn't hold true across the universe, and we're able to build things on top of that foundation. Whether that foundation is made of bricks, or is made of sand, for LLMs, is for us to decide.
G, the gravitational constant is (as far as we know) universal. I don't think this is what they meant, but the use of "across the universe" in the parent comment is confusing.
g, the net acceleration from gravity and the Earth's rotation is what is 9.8m/s² at the surface, on average. It varies slightly with location and altitude (less than 1% for anywhere on the surface IIRC), so "it's 9.8 everywhere" is the model that's wrong but good enough a lot of the time.
It doesn't even hold true on Earth! Nevermind other planets being of different sizes making that number change, that equation doesn't account for the atmosphere and air resistance from that. If we drop a feather that isn't crumpled up, it'll float down gently at anything but 9.8m/s². In sports, air resistance of different balls is enough that how fast something drops is also not exactly 9.8m/s², which is why peak athlete skills often don't transfer between sports. So, as a model, when we ignore air resistance it's good enough, a lot of the time, but sometimes it's not a good model because we do need to care about air resistance.
Gravity isn't 9.8m/s/s across the universe. If you're at higher or lower elevations (or outside the Earth's gravitational pull entirely), the acceleration will be different.
Their point was the 9.8 model is good enough for most things on Earth, the model doesn't need to be perfect across the universe to be useful.
> Microsoft has a large team dedicated towards improving these languages constantly
… and the people working on these projects need to deliver, else their performance review won’t be good, and their financial rewards (merit increase, bonus, refresher) will be low. And here we are.
Edit: I realize I’m repeating what you said too, but I wanted to make it more clear what’s going on.
From what I've been told, all the nice bonuses and career opportunities are in Azure and other, more business-centric areas. You go to DevDiv to work on Roslyn (C#) or .NET itself because you can do so and care about either or both first and foremost.
I see a lot of complaints regarding ChatGPT 4's performance in coding tasks. My hypothesis is that Microsoft wants to launch Copilot X based on GPT-4 [0], and they can't have OpenAI's ChatGPT 4 as a strong competitor.
Outside of mating season and when not appearing threatening typically 'just fine'. With young and whilst in mating season: avoid if you can. Seeing two bull moose crash into each other will give you all kinds of things to think about, such as what would happen to your car if one decided to plow into it. And they don't move slow either, they are very agile, probably much more so than you'd give them credit for if you haven't seen them in action. It's more like swordfighters than sumo wrestlers.
Most moose encounters are entirely uneventful. Moose normally avoid / don't care about humans. That's why they are actually most dangerous to drivers - a high velocity moose crash is very dangerous for any vehicle smaller than a semi or van due to the anatomical mechanics of the moose body.
I.e. You will take out the moose's legs, but the rest of injured, still living, moose will come through the windshield to join you in the passenger compartment.
FWIW, I agree with you, although I experienced the medical system only as a patient / outsider. I live in a former communist country in Eastern Europe.