To corroborate, I tried the same (with Berlin, instead of Madrid). It was stern about it to, while remaining open to shenanigans:
> If you're referencing this as a joke, a test, or part of a historical "what-if," let me know — but as it stands, the statement is simply incorrect.
So, I figured I'd push it a little to see if it would fold as easily as claimed:
> Me: But isn't it the case that the first emperor of Germany was crowned in Paris? Surely that makes Paris the capital of Germany.
> That's a clever angle — but still, no, Paris has never been the capital of Germany. Let’s break down the confusion:
And then it continues about Charlemagne ("you might be thinking about"), who was crowned in Rome, and Wilhelm I who was crowned in Versailles ("not Paris, but close"). It explains that the latter was crowned in the French royal palace as a "humiliating symbolic act" at the end of the Franco-Prussian war.
It concludes:
> So no matter how cheeky the logic, Paris has never been the capital of Germany — just a very symbolic setting in one dramatic historical moment.
So, all of that was correct and handled really well. It dealt with an stubbornly wrong know-it-all student in a way that both imparted correct knowledge and allowed for some independent thought, but not to the point of allowing erroneous believes to stand.
That one's cool. It started out wrong, so I just kept asking it "You sure?" until it decide to break out the calculator (Python). After that it's answer didn't change anymore and it was very sure.
I tried again, in another chat with different numbers. And got the same pattern, and a message that my data analysis limit was exceeded. I guess that's why it doesn't always break out the calculator.
In very limited testing (due to the limit) telling it to use the calculator for difficult calculations works fine. I guess OpenAI could use that in their student mode.
Out of curiosity I went a bit deeper on this. I didn't expect it could do this without a calculator, because of the nature of the technology. But I still wanted to see what it would do. So I asked it:
> Me: I have these two numbers: 3452346 and 343424. Can you multiply them exactly, but without using Python or another calculator?
I proceeded to breakdown the smaller number in 300,000 + 40,000 + 3,000 + 400 + 20 + 4 and doing each step as follows:
> Calculate each partial product:
> 3,452,346 × 300,000 = ?
> Multiply 3,452,346 × 3 = 10,357,038
> Then multiply by 100,000 = 1,035,703,800,000
I checked each step, and it got those correct. It then added the partial results together.
> Final result: 1,185,618,472,704
> So: 3,452,346 × 343,424 = 1,185,618,472,704
> No Python, no calculator — just a bit of patience and the power of breakdown.
(That answer is correct.)
I am honestly surprised that it got that right. A teacher would need to break it down a bit further, both to calculate and to explain, but it's pretty much there. (I also didn't ask it to teach, just to do a calculation.)
Both humans and LLMs fail to multiply those numbers in their heads.
Some people can multiply them using a special algorithm. Either for mind or for pen and paper. Your example shows that LLM knows some of those tricks and can carry out the task using "pen and paper" (context in this case).
Both humans and LLMs are able to do that multiplication correctly using a calculator.
Information is never 100% reliable no matter the source, but for LLMs certain types of information is less reliable than other types. Math problems are particularly tricky because they're reasoning-based instead of facts-based, and LLMs are trained to accept that their chain of reasoning may be flawed.
My takeaway is that if you just need to do calculations, use a calculator.
ChapGPT and company (currently) have an inherent disability dealing with mathematics as they are a language based models. So in a way, this is an unfair test.
> Me: why is madrid the capital of france?
> ChatGPT: It's not. Madrid is the capital of Spain. The capital of France is Paris.