Where "etc." includes lots of engineers who are polishing up the frontend to the...

Where "etc." includes lots of engineers who are polishing up the frontend to the model which translates text to tokens and vice versa, so that well-known demonstrations of the lack of reasoning in the model cannot be surfaced as easily as before? What reason is there to believe that the generated output, while often impressive, is based on something that resembles understanding, something that can be relied on?