Are you experiencing the HN hug of death or is your website down? Either way I'm super interested in what you're building and I'd love to use it but I cannot reach your link at the moment. Good luck with the endeavor either way and thank you for building this.
If you guys like visual-explanations that are a bit more intuitive -- I put actually made guides on both data-structures and algorithms not too long ago which you can find here:
The visual is extremely hard to understand how it relates to the description, given that it looks to be 1-indexed, but the description only makes sense for 0-indexing (4th element stores the sum of the first 4 elements), not to mention none of the binary indexes seem to be storing the correct sums (or they're storing sums of a tree that isn't being presented).
The visual is directly from here (I recommend you give it a read if you want to grasp the full intuition on how a Fenwick tree works (page 97)): https://cses.fi/book/book.pdf
I should have included the full visuals which are included there and I can see your point on why people would get confused by that visual - so I'll make a note to include the full image when I have more time. Thank you for the feedback.
I agree with the principles discussed here, but am not agreement with the rules based approach. Great looking design isn't about UI 'rules' - it's about making the users of your application love what you're presenting and the application users being able to easily use the features which you're providing without having to think. You can find out more about the principles I follow here: https://photonlines.substack.com/p/an-intuitive-guide-to-int...
This I've read before and it was very helpful. It's probably where most of my understanding comes from.
If I'm interpreting it correctly, it sort of validates my intuition that attention heads are "multi-threaded markov chain models" , in other words if autocomplete just looks at level 1, a transformer looks at level 1 for every word in the input plus many layers deeper for every word (or token) in the input.. while bringing a huge pre-training dataset to bear.
If that's correct more or less, something that surprises me is how attention is often treated as some kind of "breakthrough" - it seems obvious to me that improving a markov chain recommendation would involve going deeper and dimensionalizing the context in a deeper way.. the technique appears the same just the amount of analysis is more. I'm not sure what I'm missing here. Perhaps adding those extra layers was a hard problem thta we hadnt figured out how to efficiently do yet (?)
So I posted this conversation between Ilya Sutskever (one of the creators of ChatGPT) and Lex Fridman within that blog post and I'll provide it again below because I think it does a good job of summarizing what exactly 'makes transformers work':
Ilya Sutskever: Yeah, so the thing is the transformer is a combination of multiple ideas simultaneously of which attention is one.
Lex Friedman: Do you think attention is the key?
Ilya Sutskever: No, it's a key, but it's not the key. The transformer is successful because it is the simultaneous combination of multiple ideas. And if you were to remove either idea, it would be much less successful. So the transformer uses a lot of attention, but attention existed for a few years. So that can't be the main innovation. The transformer is designed in such a way that it runs really fast on the GPU. And that makes a huge amount of difference. This is one thing. The second thing is that transformer is not recurrent. And that is really important too, because it is more shallow and therefore much easier to optimize. So in other words, it uses attention, it is a really great fit to the GPU and it is not recurrent, so therefore less deep and easier to optimize. And the combination of those factors make it successful.
I'm not sure if the above answers your question, but I tend to think of transformers more-of as 'associative' networks (similar to humans) -- they miss many of the components which actually makes humans human (like imitation learning and consciousness (we still don't know what consciousness actually is)) but for the most part, the general architecture and the way they 'learn' I believe mimics a process similar to how regular humans learn: neurons the fire together, wire together (i.e. associative learning). This is what a huge large-language model is to me: a giant auto-associative network that can comprehend and organize information.
I'm about to say goodbye to Windows forever -- and it has nothing to do with their security vulnerabilities. It has to do with 1) the incredibly shitty software they produce (Windows 11) and 2) their lack of ethics in regards to pretty much everything. I'm about to go all in on Linux - I've been using both OSes for years now but Linux to me today is the superior choice and I see 0 reason why anyone would want to touch Windows (and I hope to god many developers follow me) -- Windows is an absolute atrocity.
I'm typing this on a computer that came with Windows 11. I booted it a few times, and reserved a terabyte of my (8TB) SSD for it, but I never use it at all. I've got some Windows VMs, from XP through 10 that I rarely use, but I use them when needed for things that require those platforms.
I got an email from Intuit two weeks ago informing me that they weren't going to support Windows 10 anymore, so I guess I'll be deleting my Windows 10 VM soon, but I have no intention of ever creating or running a Windows 11 VM, or installing anything important on the soon-to-be-deleted Windows 11 partition I have, so I guess I'll quit using TurboTax after 40 years of using it.
I looked into the possibility of running TurboTax with Wine (or CrossOver, for which I have an eternal license), but apparently it doesn't run at all there.
I use Simple-Tax (I'm based in Canada) which is 100% web-based so unfortunately I don't have a suggestion for you but I'm really surprised that Intuit doesn't provide a purely web-based interface for their software...
As an autodidact who never learned this stuff at school/uni, his lectures are what made linear algebra really click for me. I can only recommend them to anyone who wants to get a visual intuition on the fundamentals of LA.
What also helped me as a visual learner was to program/setup tiny experiments in Processing[1] and GeoGebra Classic[2].
Nitpick: No one is a visual learner, or more correctly everyone is. Multimodal is the way, so good teachers will express the same concepts in several ‘modes’ and that helps develop the intuition.
Each one has it's own strength and I use each one for different tasks:
- DeepSeek: excellent at coming up with solutions and churning out prototypes / working solutions with Reasoning mode turned on.
- Claude Code: I use this with cursor to quickly come up with overviews / READMEs for repos / new code I'm browsing and in making quick changes to the code-base (I only use it for simple tasks and don't usually trust it for implementing more advanced features).
- QWEN Coder: similar to deep-seek but much better at working with visual / image data sets.
- ChatGPT: usually use it for simple answers / finding bugs in code / resolving issues.
- Google Gemini: is catching up to other models when it comes to coding and more advanced tasks but still produces code that is a bit too verbose for my taste. Still solid progress since initial release and will most likely catch up to other models on most coding tasks soon.
Sorry -- I keep seeing this being used but I'm not entirely sure how it differs from most of human thinking. Most human 'reasoning' is probabilistic as well and we rely on 'associative' networks to ingest information. In a similar manner - LLMs use association as well -- and not only that, but they are capable of figuring out patterns based on examples (just like humans are) -- read this paper for context: https://arxiv.org/pdf/2005.14165. In other words, they are capable of grokking patterns from simple data (just like humans are). I've given various LLMs my requirements and they produced working solutions for me by simply 1) including all of the requirements in my prompt and 2) asking them to think through and 'reason' through their suggestions and the products have always been superior to what most humans have produced. The 'LLMs are probabilistic predictors' comments though keep appearing on threads and I'm not quite sure I understand them -- yes, LLMs don't have 'human context' i.e. data needed to understand human beings since they have not directly been fed in human experiences, but for the most part -- LLMs are not simple 'statistical predictors' as everyone brands them to be. You can see a thorough write-up I did of what GPT is / was here if you're interested: https://photonlines.substack.com/p/intuitive-and-visual-guid...
I'm not sure if I would say human reasoning is 'probabilistic' unless you are taking a very far step back and saying based on how the person lived, they have ingrained biases (weights) that dictates how they reason. I don't know if LLMs have a built in scepticism like humans do, that plays a significant role in reasoning.
Regardless if you believe LLMs are probabilistic or not, I think what we are both saying is context is king and what it (LLM) says is dictated by the context (either through training) or introduced by the user.
'I don't know if LLMs have a built in scepticism like humans do' - humans don't have an 'in built skepticism' -- we learn in through experience and through being taught how to 'reason' within school (and it takes a very long time to do this). You believe that this is in-grained but you may have forgotten having to slog through most of how the world works and being tested when you went to school and when your parents taught you these things. On the context component: yes, context is vitally important (just as it is with humans) -- you can't produce a great solution unless you understand the 'why' behind it and how the current solution works so I 100% agree with that.
For me, the way humans finish each other's sentences and often think of quotes from the same movies at the same time in conversation (when there is no clear reason for that quote to be a part of the conversation), indicates that there is a probabilistic element to human thinking.
Is it entirely probabilistic? I don't think so. But, it does seem that a chunk of our speech generation and processing is similar to LLMs. (e.g. given the words I've heard so far, my brain is guessing words x y z should come next.)
I feel like the conscious, executive mind humans have exercises some active control over our underlying probabilistic element. And LLMs lack the conscious executive.
e.g. They have our probabilistic capabilities, without some additional governing layer that humans have.
I think the better way to look at it is that probabilistic models seem to be an accurate model for human thought. We don't really know how humans think, but we know that they probably aren't violating information theoretic principles, and we observe similar phenomena when we compare humans with LLMs.
Yep, just like like looking at a birds feather through a microscope explains the principles of flight…
Complexity theory doesn’t have a mathematics (yet), but that doesn’t mean we can’t see that it exists. Studying the brain at the lowest levels haven’t lead to any major insights in how cognition functions.
I personally believe that quantum effects play a role and we’ll learn more once we understand the brain at that level, but I recognize that is an intuition and may well be wrong.
You seem possibly more knowledgeable then me on the matter.
My impression is that LLMs predict the next token based on the prior context. They do that by having learned a probability distribution from tokens -> next-token.
Then as I understand, the models are never reasoning about the problem, but always about what the next token should be given the context.
The chain of thought is just rewarding them so that the next token isn't predicting the token of the final answer directly, but instead predicting the token of the reasoning to the solution.
Since human language in the dataset contains text that describes many concepts and offers many solutions to problems. It turns out that predicting the text that describes the solution to a problem often ends up being the correct solution to the problem. That this was true was kind of a lucky accident and is where all the "intelligence" comes from.
So - in the pre-training step you are right -- they are simple 'statistical' predictors but there are more steps involved in their training which turn them from simple predictors to being able to capture patterns and reason -- I tried to come up with an intuitive overview of how they do this in the write-up and I'm not sure I can give you a simple explanation here, but I would recommend you play around with Deep-Seek and other more advanced 'reasoning' or 'chain-of-reason' models and ask them to perform tasks for you: they are not simply statistically combining information together. Many times they are able to reason through and come up with extremely advanced working solutions. To me this indicates that they are not 'accidently' stumbling upon solutions based on statistics -- they actually are able to 'understand' what you are asking them to do and to produce valid results.
If you observe the failure modes of current models, you see that they fail in ways that align with probabilistic token prediction.
I don't mean that the textual prediction is simple, it's very advanced and it learns all kinds of relationships, patterns and so on.
But it doesn't have a real model and thinking process relating to the the actual problem. It thinks about what text could describe a solution that is linguistically and language semantically probable.
Since human language embedds so many of the logics and ground truths that's good enough to result in a textual description that approximate or nails the actual underlying problem.
And this is why we see them being able to solve quite advanced problems.
I admit that people are wondering now, what's different about human thinking? Maybe we do the same, you invent a probable sounding answer and then check if it was correct, rinse and repeat until you find one that works.
But this in itself is a big conjecture. We don't really know how human thinking works. We've found a method that works well for computers and now we wonder if maybe we're just the same but scaled even higher or with slight modifications.
I've heard from ML experts though that they don't think so. Most seem to believe different architecture will be needed, world models, model ensembles with various specialized models with different architecture working together, etc. That LLMs fundamentaly are kind of limited by their nature as next token predictors.
I think the intuitive leap (or at least, what I believe) is that meaning is encoded in the media. A given context and input encodes a particular meaning that the model is able to map to an output, and because the output is also in the same medium (tokens, text), it also has meaning. Even reasoning can fit in with this, because the model generates additional meaningful context that allows it to better map to an output.
How you find the function that does the mapping probably doesn't matter. We use probability theory and information theory, because they're the best tools for the job, but there's nothing to say you couldn't handcraft it from scratch if you were some transcendent creature.
The text of human natural language that it is trained on encodes the solutions to many problems as well as a lot of ground truths.
The way I think of it is. First you have a random text generator. This generative "model" in theory can find the solution to all problems that text can describe.
If you had a way to assert if it found the correct solution, you could run it and eventually it would generate the text that describes the working solution.
Obviously inefficient and not practical.
What if you made it so it skipped generating all text that aren't valid sensical English?
Well now it would find the correct solution in way less iterations, but still too slow.
What if it generated only text that made sense to follow the context of the question?
Now you might start to see it 100-shot, 10-shot, maybe even 1-shot some problems.
What if you tuned that to the max? Well you get our current crop of LLMs.
What else can you do to make it better?
Tune the dataset, remove text that describe wrong answers to prior context so it learns not to generate those. Add more quality answers to prior context, add more problems/solutions, etc.
Instead of generating the answer to a mathematical equation the above way, generate the Python code to run to get the answer.
Instead of generating the answer to questions about current real world events/facts (like the weather). Have it generate the web search query to find it.
If you're asking a more complex question, instead of generating the answer directly, have it generate smaller logical steps towards the answer.