Have you ever seen what these arbitrary length whole numbers look like once they are tokenized? They don't break down to one-digit-per-token, and the same long number has no guarantee of breaking down into tokens the same way every time it is encountered.
But the algorithms they teach humans in school to do long-hand arithmetic (which are liable to be the only algorithms demonstrated in the training data) require a single unique numeral for every digit.
This is the same source as the problem of counting "R"'s in "Strawberry".
That's was the initial thinking of anyone which I explained this, it was also my speculation, but when you look in it's reasoning where it do the mistake, it correctly extract the digits out of the input token.
As I say in another comments, most of the mistakes her happen when it recopy the answer it calculated from the summation table.
You can avoid tokenization issue when it extract the answer by making it output an array of digits of the answer, it will still fail at simply recopying the correct digit.
I recently saw someone that posted a leaked system prompt for GPT5 (and regardless of the truth of the matter since I can't confirm the authenticity of the claim, the point I'm making stands alone to some degree).
A portion of the system prompt was specifically instructing the LLM that math problems are, essentially, "special", and that there is zero tolerance for approximation or imprecision with these queries.
To some degree I get the issue here. Most queries are full of imprecision and generalization, and the same type of question may even get a different output if asked in a different context, but when it comes to math problems, we have absolutely zero tolerance for that. To us this is obvious, but when looking from the outside, it is a bit odd that we are so loose and sloppy with, well basically everything we do, but then we put certain characters in a math format, and we are hyper obsessed with ultra precision.
The actual system prompt section for this was funny though. It essentially said "you suck at math, you have a long history of sucking at math in all contexts, never attempt to do it yourself, always use the calculation tools you are provided."
> But the algorithms they teach humans in school to do long-hand arithmetic (which are liable to be the only algorithms demonstrated in the training data) require a single unique numeral for every digit.
But humans don't see single digits, we learn to parse noisy visual data into single digits and then use those single digits to do the math.
It is much easier for these models to understand what the number is based on the tokens and parse that than it is for a visual model to do it based on an image, so getting those tokens streamed straight into its system makes its problem to solve much much simpler than what humans do. We weren't born able to read numbers, we learn that.
.. and you can "program" a neural network — so simple it can be implemented by boxes full of marbles and simple rules about how to interact with the boxes — to learn by playing tictactoe until it always plays perfect games. This is frequently chosen as a lesson in how neural network training even works.
But I have a different challenge for you: train a human to play tictactoe, but never allow them to see the game visually, even in examples. You have to train them to play only by spoken words.
Point being that tictactoe is a visual game and when you're only teaching a model to learn from the vast sea of stream-of-tokens (similar to stream-of-phonemes) language, visual games like this aren't going to be well covered in the training set, nor is it going to be easy to generalize to playing them.
Well whatever your story is, I know with near certainty that no amount of scaffolding is going to get you from an LLM that can't figure out tic-tac-toe (but will confidently make bad moves) to something that can replace a human in an economically important job.
- but tokens are not letters
- but humans fail too
- just wait, we are on an S curve to AGI
- but your prompt was incorrect
- but I tried and here it works
Meanwhile, their claims:
- LLMs are performing at PhD levels.
- AGI is around the corner
- humanity will be wiped out
- situational awareness report
I am a huge protester to the "constantly abbreviating words" habit in coding. It may have had a place when source code space had drastic limitations, but today "func, proc, writeln, strcpy" are anathema to me. Also I get that a lot of these in Seed7 examples were lifted unchanged from Pascal, but that just means that I dislike those aspects of Pascal as well.
I am of the camp "use full English words", and "if the identifier is too long then spend the time needed to find a more concise way to say what you mean in fewer or shorter full English words". Incidentally AI can be pretty good at brainstorming that, which is lovely.
… yet you have contracted «artificial intelligence» to «AI», have not you?
The case for abbreviated keywords will always exist as some will prefer contractions whereas some will have a preference for fully spelled out words.
At the opposite side of the spectrum there C / C++ that use neither for functions and procedures but «printf» and «strcpy» – as you have rightfully pointed out which ADA, COBOL and Objective C contrast with
ADA: «Is_Valid_User_Access_Level_For_Requested_Operation», «Convert_String_To_Standardised_Date_Format»
COBOL: «PERFORM Calculate-Totals VARYING Index FROM 1 BY 1 UNTIL Index > Max-Index»
Objective C: «URLSession: dataTask: didReceiveResponse: completionHandler:»
I do not think that a universal agreement on the matter is even possible.
Well the threshold I would like to use is "abbreviations that are easily understood outside of coding jargon are acceptable". You don't have to be a specialist in any specific language to understand "AI" in the wild, or "NASA" or even the names of languagues such as COBOL.
But if outside of the context of coding you just say "strcpy" or "writeln" at somebody they're not going to immediately understand. As a result, even coders with tired brains or who are switching between languages a lot will also get hung up at inconvenient times.
Unfortunately "Every fruit has its seed (yes even seedless ones, in that circumstance the seed is the effort humans put into grafting it)" which is a saying that clarifies in all situations far beyond fruit, any replicating system that is of benefit to a third party must also wrap some portion of its benefit into self-replication that does not immediately benefit a third party.
Whether that takes the shape of money or some different shape, it remains the case that "free benefit" cannot exist, and that any beneficial system requires some kind of give to supplement the take that it offers.
Finding a way to establish that with balance is the challenge.
> the long accepted idea that lower case is easer to read than upper case
uh.. that sounds to me about as accepted as "cursive is easier to read than print".
Upper case is the canonical form of our alphabet (as written in Latin) while lower case is a newer addition (adapted from many greek letter shapes) that may be easier to write in rapid succession, but as such that also makes it one step towards cursive.
When I was a child in elementary school I was taught that "you all have to learn cursive because when you grow up that's what adults use, they don't use print any more". I remember thinking about that while driving with my parents, and asking them "if adults use cursive exclusively like my teacher says then why are all the road signs in print"?
I can levy that same query to your statement: if it is a long accepted idea that lower case is easier to read, then why are all of the road signs (which famously prioritize ease of reading) always written in all caps?
I think he's implying that humans require available information from which to learn new things, and that borrowing a term from AI research is one valid (if backwards-sounding) way to describe that fact.
> French pronunciation is mostly consistent (more so than English at least)
Most of English's inconsistencies stem from words absorbed from other languages, and far and away the largest helping of that was the French that British nobility picked up during the Norman invasion.
My understanding of French pronunciation primarily revolves around the idea that 80% of words end in three randomly selected vowels followed by 1-3 randomly selected maximally hard consonants such as j, x, z, k.. and that the sum total of those randomly selected letters always sound identical to the vowel portion of the word "œuf" which means "egg". Which is also basically like trying to say "eww" while you have an egg in your mouth.
No offense but this is a sophomoric take. I'd be willing to bet that more native English words have irregular spelling than norman/Latin/other imports. The same thing happened in French too. Often orthographic changes lags
pronouciation changes. The reason many English words have irregular spellig is because English has been a written language for a long time. That is why you have words like Knight, Knee, Enough, Eight, Cough, etc which are all native words. My understanding is the k in kn words used to be prounouced.
Knee is the same in German as it is in English. However, the Germans pronounce the K, e.g., "Kah-nee."
The word for "Knight" in German is "Ritter" if I am not mistaken? Though, I have no idea where the word Knight comes from. (Which I intend to look up after posting this).
Since English dictionaries are arranged in "alphabetical order" to make finding the word one wishes to know the definition easier, I'm not curious if the Chinese writing system has anything approaching an "alphabetical order", or any kind of canonical way to order strings of Chinese text. And relatedly, how do they find words in their dictionaries?
(this is normally something I would google but it doesn't sound like something I'd get a high signal to noise ratio on given the ambiguous terms at hand)
The alphabet is a marvelous invention. I seem to remember that Europeans in China (and places with a large Chinese diaspora) used alphabetical sorting of whatever romanization they favoured (different between English, French, Dutch). Much easier than radicals and stroke counting.
I guess to try to echo the question: If a reader was reading along and just ran into "葡" in isolation in the text (eg, not adjacent to another character that it normally combines with) would they be able to confidently emit any sound that corresponds to what they are saying, or would it be perceived more like a punctuation error in English given that anglophones do very little to change the sound they are making as a result of punctuation (possibly just changing rhythm instead)?
But the algorithms they teach humans in school to do long-hand arithmetic (which are liable to be the only algorithms demonstrated in the training data) require a single unique numeral for every digit.
This is the same source as the problem of counting "R"'s in "Strawberry".