As in the case of constraint solvers there are tons of heuristics that have been applied over the years to make performance look like it scales somewhat linearly in many or most problems. This doesn’t mean p==np
Most of those have had practical deciders functions.
If you consider NP-complete as the intersection of NP, which is a decision problem, and NP-hard which a decider function isn't possible; look at the success of heuristics between those two sets.
Existential quantifiers on the second order terms in NP is the same thing.
Over Parameterization can be thought as less lossy compression and attention can be thought of as improved retrieval or as additional expressiveness by not missing out on as many outputs using binary output.
You can consider how LLMs tend to append, resulting in telescope reduction as one of those approximation reductions.
But this is due to the combinatorial optimization, which is far more difficult.
Another way to think about it is that PH is equal to the set of boolean queries on a concurent random acess machine using exponentially many processors and constant time.
If your datasets have the Markovian property and are close to ergotic there are options, but you probably wouldn't need to resort to attention and over parameterization in that case.
The word sense disambiguation problem is probably another lens. That is harder yet but may be a way to think about what I am trying to explain.