Hacker Newsnew | past | comments | ask | show | jobs | submit | wppick's commentslogin

> It has come as a shock to some AI researchers that a large neural net that predicts next words seems to produce a system with general intelligence

When I write prompts, I've stopped thinking of LLMs as just predicting a next word, and instead to think that they are a logical model built up by combining the logic of all the text they've seen. I think of the LLM as knowing that cats don't lay eggs, and when I ask it to finish the sentence "cats lay ..." It won't generate the word eggs even though eggs probably comes after lay frequently


  > It won't generate the word eggs even though eggs probably comes after lay frequently
Even a simple N-gram model won't predict "eggs". You're misunderstanding by oversimplifying.

Next token prediction is still context based. It does not depend on only the previous token, but on the previous (N-1) tokens. You have "cat" so you should get words like "down" instead of "eggs" with even a 3-gram (trigram) model.


No, your original understanding was the more correct one. There is absolutely zero logic to be found inside an LLM, other than coincidentally.

What you are seeing is a semi-randomized prediction engine. It does not "know" things, it only shows you an approximation of what a completion of its system prompt and your prompt combined would look like, when extrapolated from its training corpus.

What you've mistaken for a "logical model" is simply a large amount of repeated information. To show the difference between this and logic, you need only look at something like the "seahorse emoji" case.


No, their revised understanding is more accurate. The model has internal representations of concepts; the seahorse emoji fails because it uses those representations and stumbles: https://vgel.me/posts/seahorse/


Word2vec can/could also do the seahorse thing. It at least seems like there's more to what humans consider a concept than a direction in a vector space model (but maybe not).

https://www.analyticsvidhya.com/blog/2021/07/word2vec-for-wo...


No, this is not a demonstration of logic or knowledge. It is a demonstration of a relational database.

A markov chain presents the same representation in a lower vector space.


If anything, the seahorse emoji case is exactly the type of thing you wouldn't expect to happen if LLMs just repeated information from their training corpus. It starts producing a weird dialogue that's completely unlike its training corpus, while trying to produce an emoji it's never seen during training. Why would it try to write an emoji that's not in its training data? This is totally different than its normal response when asked to produce a non-existent emoji. Normally, it just tells you the emoji doesn't exist.

So what is it repeating?

It's not enough to just point to an instance of LLMs producing weird or dumb output. You need to show how it fits with your theory that they "just repeating information". This is like pointing out one of the millions of times a person has said something weird, dumb, or nonsensical and claiming it proves humans can't think and can only repeat information.


> It starts producing a weird dialogue that's completely unlike its training corpus

But it's not doing that. It's just replacing a relation in vector space with one that we would think is distant.

Of course you would view an LLM's behavior as mystifying and indicative of something deeper when you do not know what it is doing. You should seek to understand something before assigning mysterious capabilities to it.


You're not addressing the objection. What is it about your model of how you think LLMs work (that it's just "repeated information") that predicts they'd go haywire when asked about a seahorse emoji (and only the seahorse emoji)? Why does your model explain this better than the standard academic view of deep neural nets?

You just pointed out an example of LLMs screwing up and then skipped right to "therefore they're just repeating information" without showing this is what your explanation predicts.


If you want to have a conversation with me, please stop creating fake quotes and assigning them to me, and please stop lying.


"repeated information" was copied verbatim from your comment. Your full sentence was:

> What you've mistaken for a "logical model" is simply a large amount of repeated information.


If you copy two words from me and put them in a difference sentence that means something else, that's a lie. If you want to argue with a strawman, that's something you can go rely on an LLM for instead of me.


I haven't lied. You're making accusations in bad faith. This was a faithful representation of your position as best as I can tell from your comment.

If you'd like to explain why "What you've mistaken for a 'logical model' is simply a large amount of repeated information." actually means something else, or why you think I've misinterpreted it, be my guest.


> There is absolutely zero logic to be found inside an LLM

Surely trained neural networks could never develop circuits that implement actual logic via computational graphs...

https://transformer-circuits.pub/2025/attribution-graphs/met...


You're both using two different definitions of the word "logic". Both are correct usages, but have different contexts.


Brute force engineering solutions to appear like the computer is thinking. When we have no idea how we think ourselves. This will never generate true intelligence. It executes code, then it stops, it is a tool, nothing more.


I often wonder whether neuroscience on LLMs is harder or humans?


It's called "prompt engineering", and there's lots of resources on the web about it if you're looking to go deep on it


Also upwards accumulation of wealth can sometimes mean less tax revenue. Middle class salary workers pay a lot of tax, so with more upwards accumulation of wealth (maybe accelerating due to AI) then what will happen to tax revenue? People getting laid off don't pay tax, and shifting that money to corporate, tax havens, and cap gains types of taxes will probably end up lower overall


> Middle class salary workers pay a lot of tax

No they don't. If we're talking about federal income tax, the vast majority of is paid by the wealthy.


You are correct. But you are commenting in a place where mid six figure engineers consider themselves middle class and not wealthy.

Yes, the top 20% pay the vast majority of taxes and are taxed at the highest rate until you get into the ownership classes where income goes down and capital gains goes up. Plus that’s when all the tax deferral strategies come into play.

And yes, by all reasonable definitions if you are in the top 20% either income or wealth you are categorically wealthy.


All tax. Income tax, payroll tax, consumption tax


Corp income tax (paid by shareholders from the rich to teacher pensions to anyone with a retirement account...), estate tax,...

When all is accounted for ... The rich still pay a far larger share than the income they earn. It's why OECD rates the US as the most progressive tax system among member nations.


True or not that the wealthy pay a greater share of their money in taxes, it doesn't matter. The money has to come from somewhere and the middle and lower classes can't afford it. Also the middle class can't pay more and continue buying the super wealthy's goods. We need to spend less and tax more. 1 trillion in interest per year is insane.


... which implies these taxes get actually payed. At least in the same proportions as lower income taxes get payed.

There is no effective taxation when avoidance is easy and risk-free.


They are talking about the final numbers at the end of the day. Effective tax rates.

It’s the same no matter if you want to use effective vs nominal. The numbers change, but relatively speaking they are roughly the same.


Property tax, sales tax, tariffs, ...


Funny how you all rebelled in the good old days over a tax of 3 pence per pound of tea.


IIRC it was mainly about the government giving the East India Company a monopoly on all tea imports sidelining local merchants in Boston.

The tax was used to rile up the mobs by merchants/smugglers, because the monopoly tea was actually cheaper than what they could ship from Holland.


Debt is just promises for future currency. One persons debt is another person asset. But what happens is when you "over promise" by accumulating too much debt then all those people with debt as their assets think they have X dollars, but if there's more debt than underlying resources can cover then it might end up all those people only end up getting like 0.65X or possibly even lower. Kind of like a bank run


> Some people are gifted in memorization of these things

Those are usually people who aren't changing languages or frameworks. Memory is mostly recency and repetition, so if you want better memory then narrowing scope is a good strategy. I'd rather go broad so that I can better make connections between things but have to always look up the specifics especially now with LLMs right there


One of the reasons I hate interviewing for software jobs is that the logic seems to be the opposite of this, and instead you should have any possible esoteric concept or possible problem ready at the top of your head instantly. And the same idea now with not allowing LLM for technical interviews


I agree I don’t think interviews match development reality very well


> An ounce of prevention is worth a pound of cure, after all.

  Don’t do what? Consider the primary cause of conflicts: simultaneous operations occurring on the same data on different nodes. That happens because data may not have distinct structural or regional boundaries, or because a single application instance is interacting with multiple nodes simultaneously without regard for transmission latency.

  Thus the simplest way to avoid conflicts is to control write targets

  Use “sticky” sessions. Applications should only interact with a single write target at a time, and never “roam” within the cluster.

  Assign app servers to specific   (regional) nodes. Nodes in Mumbai shouldn’t write to databases in Chicago, and vice versa. It’s faster to write locally anyway.

  interact with specific (regional) data. Again, an account in Mumbai may physically exist in a globally distributed database, but multiple accessors increase the potential for conflicts.

  Avoid unnecessary cross-node activity. Regional interaction also applies on a more local scale. If applications can silo or prefer certain data segments on specific database nodes, they should.

  To solve the issue of updates on different database nodes modifying the same rows, there’s a solution for that too: use a ledger instead

Best points are this summary near the end. IMO it's better to also allow for slower writes doing something simpler than trying to complex distributed stuff just so writes are quick. Users seem to have pretty long tolerance for something they understand as a write taking even many seconds.


The formula is usually more money and ability to work special team isolated from the usual toxic orgs. I think A9 was probably somewhat like that, and AWS probably used to be at some point long ago


One of the most interesting things to me when reading this was that it was treated as a bug even though it was that hard to reproduce. Most dev shops would not have the bandwidth and management but in to spend the time to dig into something like that unless it was high severity, and also it sounds like it was also getting caused from a modded version of the software


Try experimenting with diet like cutting out/down on sugar or salt and see if it makes any difference. There's no strong evidence that EMF can cause tinnitus but would be interesting to test that out somehow too (camping/cabin trip in a radio free zone?)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: