Hacker Newsnew | past | comments | ask | show | jobs | submit | drclau's commentslogin

How do you know the confidence scores are not hallucinated as well?


They are, the model has no inherent knowledge about its confidence levels, it just adds plausible-sounding numbers. Obviously they _can_ be plausible, but trusting these is just another level up from trusting the original output.

I read a comment here a few weeks back that LLMs always hallucinate, but we sometimes get lucky when the hallucinations match up with reality. I've been thinking about that a lot lately.


> the model has no inherent knowledge about its confidence levels

Kind of. See e.g. https://openreview.net/forum?id=mbu8EEnp3a, but I think it was established already a year ago that LLMs tend to have identifiable internal confidence signal; the challenge around the time of DeepSeek-R1 release was to, through training, connect that signal to tool use activation, so it does a search if it "feels unsure".


Wow, that's a really interesting paper. That's the kind of thing that makes me feel there's a lot more research to be done "around" LLMs and how they work, and that there's still a fair bit of improvement to be found.


In science, before LLMs, there's this saying: all models are wrong, some are useful. We model, say, gravity as 9.8m/s² on Earth, knowing full well that it doesn't hold true across the universe, and we're able to build things on top of that foundation. Whether that foundation is made of bricks, or is made of sand, for LLMs, is for us to decide.


It doesn't hold true across the universe? I thought this was one of the more universal things like the speed of light.


G, the gravitational constant is (as far as we know) universal. I don't think this is what they meant, but the use of "across the universe" in the parent comment is confusing.

g, the net acceleration from gravity and the Earth's rotation is what is 9.8m/s² at the surface, on average. It varies slightly with location and altitude (less than 1% for anywhere on the surface IIRC), so "it's 9.8 everywhere" is the model that's wrong but good enough a lot of the time.


It doesn't even hold true on Earth! Nevermind other planets being of different sizes making that number change, that equation doesn't account for the atmosphere and air resistance from that. If we drop a feather that isn't crumpled up, it'll float down gently at anything but 9.8m/s². In sports, air resistance of different balls is enough that how fast something drops is also not exactly 9.8m/s², which is why peak athlete skills often don't transfer between sports. So, as a model, when we ignore air resistance it's good enough, a lot of the time, but sometimes it's not a good model because we do need to care about air resistance.


Gravity isn't 9.8m/s/s across the universe. If you're at higher or lower elevations (or outside the Earth's gravitational pull entirely), the acceleration will be different.

Their point was the 9.8 model is good enough for most things on Earth, the model doesn't need to be perfect across the universe to be useful.


g(lower case) is literally gravitational force of Earth at surface level. It's universally true, as there's only one Earth in this universe.

G is the gravitational constant which is also universally true(erm... to the best of our knowledge), g is calculated using gravitational constant.


they 100% are unless you provide a RUBRIC / basically make it ordinal.

"Return a score of 0.0 if ...., Return a score of 0.5 if .... , Return a score of 1.0 if ..."


According to Google Maps "measure distance" tool it's ~630 miles, or ~1000 km. I am very surprised it was felt so strongly at such a distance.


Not just felt, death tolls too.


Not surprising. A 7.7 is absolutely massive. (In terms of energy, 10^23.35 erg. Or 5 megatons of TNT, if my math works)


Mare Tranquillitatis pit solves these problems. And there are likely many more caves that haven't been discovered yet.

https://en.wikipedia.org/wiki/Mare_Tranquillitatis_pit


> Microsoft has a large team dedicated towards improving these languages constantly

… and the people working on these projects need to deliver, else their performance review won’t be good, and their financial rewards (merit increase, bonus, refresher) will be low. And here we are.

Edit: I realize I’m repeating what you said too, but I wanted to make it more clear what’s going on.


From what I've been told, all the nice bonuses and career opportunities are in Azure and other, more business-centric areas. You go to DevDiv to work on Roslyn (C#) or .NET itself because you can do so and care about either or both first and foremost.


Well, they do tell you in the UI that chats are stored for 30 days even when you disable history. And then there's a link to this:

https://help.openai.com/en/articles/7730893-data-controls-fa...


I see a lot of complaints regarding ChatGPT 4's performance in coding tasks. My hypothesis is that Microsoft wants to launch Copilot X based on GPT-4 [0], and they can't have OpenAI's ChatGPT 4 as a strong competitor.

[0]: https://github.com/features/preview/copilot-x


How did the encounter go? Don’t leave us hanging here!


Outside of mating season and when not appearing threatening typically 'just fine'. With young and whilst in mating season: avoid if you can. Seeing two bull moose crash into each other will give you all kinds of things to think about, such as what would happen to your car if one decided to plow into it. And they don't move slow either, they are very agile, probably much more so than you'd give them credit for if you haven't seen them in action. It's more like swordfighters than sumo wrestlers.

edit: this is a good sample:

https://www.youtube.com/watch?v=g-7imHBlguk


Most moose encounters are entirely uneventful. Moose normally avoid / don't care about humans. That's why they are actually most dangerous to drivers - a high velocity moose crash is very dangerous for any vehicle smaller than a semi or van due to the anatomical mechanics of the moose body.


I.e. You will take out the moose's legs, but the rest of injured, still living, moose will come through the windshield to join you in the passenger compartment.


I walked away quickly and so did it. I think it was just as surprised as me.

It is certainly the closest I have been to a moose, but living in that area, I encountered them fairly often. They are usually pretty docile.


Out of curiosity, where are you from?

FWIW, I agree with you, although I experienced the medical system only as a patient / outsider. I live in a former communist country in Eastern Europe.


How can you create a US account? Don't they ask for a credit/debit card and/or phone number?


I bought a $20 US gift card on ebay and redeemed that into a new US account


> I'm sure it's because the number of people on 5G is drastically lower than the number of people on 4G/LTE.

5G has increased capacity over 4G, so even if all 4G users would switch to 5G, you will still have better service.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: