It is an unfortunately anthropomorphizing term for a transformer simply operating as designed, but the thing it's become a vernacular shorthand for, "outputting a sequence of tokens representing a claim that can be uncontroversially disproven," is still a useful concept.
There's definitely room for a better label, though. "Empirical mismatch" doesn't quite have the same ring as "hallucination," but it's probably a more accurate place to start from.
Regardless I don't think there's much to write papers on, other than maybe an anthropological look at how it's affected people putting too much trust into LLMs for research, decision-making, etc.
If someone wants info to make their model to be more reliable for a specific domain, it's in the existing papers on model training.
There's definitely room for a better label, though. "Empirical mismatch" doesn't quite have the same ring as "hallucination," but it's probably a more accurate place to start from.