Hacker Newsnew | past | comments | ask | show | jobs | submit | dudu24's commentslogin

If this had a higher refresh rate I'd be making unwise financial decisions today

I'm tired of this pseudointellectual reductionist response. It's not "literally by accident" when they're trained to do something, as if we are not also machines that generate next actions based on learned neural weights and abstract (embedded) representations. Your issue is with semantics rather than content.

Obviously "hallucinate" and "lie" are metaphors. Get over it. These are still emergent structures that we have a lot to learn from by studying. But I suppose any attempt by researchers to do so should be disregarded because Person On The Internet has watched the 3blue1brown series on Neural Nets and knows better. We know the basic laws of physics, but spend lifetimes studying their emergent behaviors. This is really no different.


I just kind of wish the behavior for "hallucinations" just didn't have such confident language in the context... actual people will generally be relatively forthcoming at the edge of their knowledge or at least not show as much confidence. I know LLMs are a bit different, but that's about the best comparison I can come up with.


Of course they hallucinate because we are training on random mode. +Since you mentioned 3blue1brown there is an excellent video on ANN interpretation based on the works of famous researchers who attempt to provide plausible explanations about how these (transformers based) archs store and retrieve information. Randomness and stochasticity is literally the most basic components which allow all these billions of parameters to represent better embedding spaces almost hilbertian in nature and barely orthogonal as training progresses.

The "emergent structures" you are mentioning are just the outcome of randomness guided by "gradiently" descending to data landscapes. There is nothing to learn by studying these frankemonsters. All these experiments have been conducted in the past (decades past) multiple times but not at this scale.

We are still missing basic theorems, not stupid papers about which tech bro payed the highest electricity bill to "train" on extremely inefficient gaming hardware.


This is a non-response.


Disagree, it's making a valid observation.

If someone is nominally trying to convince you of a point, but they shroud this point within a thicket of postmodern verbiage* that is so dense that most people could never even identify any kind of meaning, you should reasonably begin to question whether imparting any point at all is actually the goal here.

*Zizek would resist being cleanly described as a postmodernist - but when it comes to his communication style, his works are pretty much indistinguishable from Sokal affair-grade bullshit. He's usually just pandering to a slightly different crowd. (Or his own navel.)


I'm also losing my ability to tolerate prose without headings, but I think that's symptomatic of this bigger issue.


I usually scroll a page to see how many headings it has, but I'm looking for the opposite. Too many headings is one of the quickest aesthetic clues that I'm looking at slop, as it doesn't require me to read any of the text. (Emojis and over-usage of bullet point lists are the others I can think of in this category.)


I noticed something similar when working with (unlike the post's author, non-marxist, as far as I know) Russian developers who had made the jump abroad (EU).

When debating directions, some of them focused on just never stopping talking. Instead of an interactive discussion (5-15 seconds per statement), they consistently went with monotone 5-10 minute slop. Combined with kind of crappy English it is incredibly efficient at shutting down discourse. I caught on after the second guy used the exact same technique.

This was a long time ago. I have since worked with some really smart and nice russian developers escaping that insane regime. And some that I wish would have stayed there after they made their political thoughts on Russia known.


When you have a 30 minutes meeting with busy people, a single 15 minute monologue might buy you another week to solve your problem.

Indeed, very efficient, usually it requires somebody to put his foot down AND a consensus to deescalate immediately. If you have an antidote, please let me know.


I cannot stand webpages that hijack scrolling like that.


That is not contrary to token-at-a-time approach.


It's just an application of the chain rule. It's not interesting to ask who invented it.


From the article:

Some ask: "Isn't backpropagation just the chain rule of Leibniz (1676) [LEI07-10] & L'Hopital (1696)?" No, it is the efficient way of applying the chain rule to big networks with differentiable nodes—see Sec. XII of [T22][DLH]). (There are also many inefficient ways of doing this.) It was not published until 1970 [BP1].


The article says that but it's overcomplicating to the point of being actually wrong. You could, I suppose, argue that the big innovation is the application of vectorization to the chain rule (by virtue of the matmul-based architecture of your usual feedforward network) which is a true combination of two mathematical technologies. But it feels like this and indeed most "innovations" in ML is only considered as such due to brainrot derived from trying to take maximal credit for minimal work (i.e., IP).


The real metric is whether anyone remembers it in 100 years. Any other discussion just comes off as petty.


You got it right: Leibniz!


This misses the point of isospin. Isospin is an approximate SU(2) symmetry due to the fact that the up and down quarks (the "light" quarks) have very similar masses compared to the rest of the quarks, so they can be approximated as two different eigenstates of the same particle. It's mathematically identical to the SU(2) symmetry of a spin-half particle. The reason it doesn't include the other quarks is because they are so much more massive.


For better or worse, news flows through social media, so this approach basically amounts to ignoring all the bad stuff going on. If you read HN, chances are you can probably safely get through the next four years doing this. But as the saying goes, "first they came for the communists..."


> grand vector space

what.


In the language of "embeddings" of machine learning.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: