- Even for billion-parameter theories, a small amount of vectors might dominate the behaviour. A coordinate shift approach (PCA) might surface new concepts that enable us to model that phenomenon. "A change in perspective is worth 80 IQ points", said Alan Kay.
- There is analogue of how we come up with cognitive metaphors of the mind ("our models of the mind resemble our latest technology (abacus, mechanisms, computer, neural network)"), to be applied to other complicated areas of reality.
> Even for billion-parameter theories, a small amount of vectors might dominate the behaviour.
We kinda-sorta already know this is true. The lottery-ticket hypothesis [0] says that every large network contains a randomly initialized small network that performs as well as the overall network, and over the past eight years or so researchers have indeed managed to find small networks inside large networks of many different architectures that demonstrate this phenomenon.
Nobody talks much about the lottery-ticket hypothesis these days because it isn’t practically useful at the moment. (With the pruning algorithms and hardware we have, pruning is more costly than just training a big network.) But the basic idea does suggest that there may be hope for interpretability, at least in the odd application here or there.
That is, the (strong) lottery-ticket hypothesis suggests that the training process is a search through a large parameter space for a small network that already (by random initialization) exhibit the overall desired network behavior; updating parameters during the training process is mostly about turning off the irrelevant parts of the network.
For some applications, one would think that the small sub-network hiding in there somewhere might be small enough to be interpretable. I won’t be surprised if some day not too far into the future scientists investigating neural networks start to identify good interpretable models of phenomena of intermediate complexity (those phenomena that are too complex to be amenable to classic scientific techniques, but simple enough that neural networks trained to exhibit the phenomena yield unusually small active sub-networks).
You literally can do a kind of model PCA, using the Hessian (matrix of second derivatives of the loss function w/r/t the parameters, aka the local curvature of the loss landscape), and diagonalizing. These eigenvectors and eigenvalues (the spectrum of the Hessian) tend to be power-law distributed in just about every deep NN you can think of [1].
That is, there are a few "really important" (highly curved) dimensions in parameter space (the top eigenvectors) which control the model's performance (the loss function). Conversely, there are very many "unimportant"/low curvature dimensions in the model. There was a recent interesting paper that showed that "deleting" these low-curvature dimensions appeared to correspond to removing "memorized" information in LLMs, such that their reasoning performance was left unchanged while their ability to answer questions which require some memorized knowledge was reduced [2].
It appears that sometimes models undergo dramatic transitions from memorization to perfect generalization, which corresponds to the models becoming much more compressible [3].
I'm hopeful that we'll find a way to distill the models down to the most useful core cognitive/reasoning capabilities, and that that core will be far simpler than the current scale of LLMs. But they might need to look stuff up like we do without all that memorized world knowledge!
I don't disagree, but neither does the article. It's just talking about the fact that we previously considered anything that can't be easily and tersely written down as nearly or entirely intractable. But, as we have seen, the three body problem is not really a hum-dinger as far as the universe goes, it's not even table stakes. We need to be able to do the same kind of energy arbitrage on n-body problems that we do on 2. And now we have the beginnings of a place to toy with more complicated ideas -- since these won't fit on a blackboard.
Problems with opaque stability boundaries that observe non-liner effects are always great. Chaos theory makes it even more fun as your observation can change the outcome.
A fundamental thing that this misses, I think, is that the reinforcement learning approaches of AlphaGo do generate that sensation of lack-of-narrative, everything together at the same time alien thinking, whereas using an LLM as hypothesized would have a clear tree-like approach with an overarching thesis, so the approach would be more traditional / human like.
I don't understand why the poster (which is the author) links us to a slop report of a test for their library. It would be much more effective to cover part of this info into the README where we get the context of what they want to achieve (where there is a very clear "Why?" section), and then link to it instead. I have flagged it as AI slop.
I don't understand LISP or Clojure, but it seems to be some kind of library for making web services out of LISP, which has some separate components that are somehow well defined. And somehow it's all related to AI.
Again I don't know much about Clojure and I am too slow for functional programming in general.
The whole point of the framework is to see what LLM oriented framework would look like. My argument is that the way code is normally structured is not conducive towards LLMs because context grows in unbounded way, and they end up getting lost.
The whole point of the 'slop' report is to have the LLM try implementing the features using both the traditional approach and the framework and then reflect on how it fared with each approach.
Hey Michael, great project! If you don't mind me testing you, as a word game builder, what do you think about the latest developments of international policies?
Surprised that no comment mentioned that there is a standard term (not a word :P) for the set of words that denominates a particular concept: nominal syntagm. Such as "boiling water" and also "that green parrot we saw yesterday over the left branch".
Also the slider examples are abysmal. "I love you", "Go home" and "How are you" are not words by any stretch of imagination. For someone who makes word games, I don't see a particularly deep love of words here.
Added a note: "'I love you' isn't opaque, but it's tight enough to put on a tile." The familiar end of the spectrum picks up collocations that are transparent but loaded — I'm not claiming they're words in the traditional sense, but they're useful vocabulary for word games, which is where I'm coming from.
> "'I love you' isn't opaque, but it's tight enough to put on a tile."
The problem with introducing phrase/sentences into a word game (let's take Scrabble) is that you'd spend half the night with your friends arguing over what is and is not acceptable with the only litmus test being its... corpus frequency?
Dictionaries would be ruined if they could only use popular words. I guess you might be describing very early children's dictionaries, but I dont think that's your argument?
Funnily enough, "nominal syntagm" is, itself, not in the OED or Wiktionary. But Wiktionary has "syntagme nominal" as the French translation for "noun phrase".
You really have to love the human messiness of language!
A nominal syntagm is a somewhat overlapping concept, but deviates slightly from the direct discussion taking place. The more appropriate standard term here is: open compound word. Or, as one might say casually: word.
I'm mildly interested in keyboard ergonomics and efficiency, but every time I read threads like this I can't really see me spending $200+ on a keyboard when there are quite functional versions at $10 (and I know that the vast majority of the world is typing on either those or a laptop keyboard). To me it reads too much like an audiophile discussion about whether the materials of the cable affect sound, or just people "playing" (similar to cyberdecks). Not just trying to be provocative here, but those prices just seem crazy and just Silicon Valley posturing. Is there a post here that is not partially signalling "I'm rich enough to splurge in an unreasonably expensive set of keys"? Or am I being too harsh? The most grounded take seems to be easterncalculus's "https://news.ycombinator.com/item?id=47083354" where he mentions that the best approach is to rest and exercise a little bit instead of having led-colored chording keyboards.
Mechanical and Ergonomic keyboard prices suffer because they are niche and way lower volume. The switches are not that expensive if you use common ones, but if you want some special keycaps, prepare to fork some good money. Most of them are made in very low batches and most of the expense is in the molds you have to make.
Your $10 keyboard is probably so mass produced that literal millions have been manufactured. Versus custom-ish keyboard designs that are made for at most a thousand buyers.
There are much cheaper Microsoft ergonomic keyboards, even if the 2 keyboard halves are not separate, so you cannot adjust the distance between them or the lateral tilting.
I had been using for many years Microsoft ergonomic keyboards and it was still much more comfortable than with a classic keyboard.
A few years ago I have switched to a really split keyboard (Kinesis Freestyle), which was an improvement over Microsoft, but not a so great improvement as Microsoft was over a standard keyboard.
Unfortunately, Microsoft has first discontinued their cheapest ergonomic keyboards, which had almost the same price as standard keyboards. Then the remaining more expensive models have been sold to Incase in 2024, so they can now be found "Incase Designed by Microsoft" products, but at significantly higher prices than when they were made by Microsoft. Even so, they might be the cheapest ergonomic keyboards of decent quality. Microsoft still sells a "Microsoft Surface Ergonomic Keyboard".
Old stock cheaper Microsoft ergonomic keyboards may still be found at certain shops.
On eBay and the like a lot of old and very cheap Microsoft ergonomic keyboards can be found, but buying a "pre-owned" keyboard is risky, as you do not know how worn out it is. Moreover the wireless MS keyboards used proprietary USB dongles paired with the keyboard. If an old wireless keyboard is sold without the dongle, it cannot be used unless it also has a wired connection.
For someone who types all day, there is a great difference in comfort and fatigue between a classic keyboard and a good ergonomic keyboard. Young people typically do not care much about the quality of their keyboards, pointing devices and monitors, but after decades of using computers every day many of them regret their negligence, which could have avoided unpleasant health problems.
It's very much a hobby, like audiophiles, but also like tricking out your car. It can be expensive. It's more than the average person needs. It's fun picking out parts and being part of a community.
There's also a perspective bias that most mechanical keyboard content you see is not made by people who found a keyboard that made the wrists stop hurting and then went on with their life. The people enthusiastic enough to make content are also the people that have a bunch of tricked out keyboards.
I actually think it was quite prescient and still raises important topics to consider - irrespective of whether weights are uploaded from an actual human, if you dig just a little bit under the surface details, you still get a story about ethical concerns of a purely digital sentience. Not that modern LLMs have that, but what if future architectures enable them to grow an emerging sense of self? It's a fascinating text.
reply