I see what you mean, and it's indeed quite likely that texts containing such hypothetical scenarios were included in the dataset.
Nonetheless, the implication is that the model was able to extract the conditional represented, recognize when that condition was in fact met (or at least asserted: "The queen died."), and then apply the entailed truth.
To me that demonstrates reasoning capabilities, even if for example it memorized/encoded entire Quora threads in its weights (which seems unlikely).
If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
https://www.quora.com/Once-Queen-Elizabeth-dies-will-Prince-...
There are loads of articles and discussions online speculating about what “will” happen when Queen Elizabeth dies.
When you have a very, very, very large corpus to sample from, it can look a lot like reasoning.