The real lesson is that these are just random results and all models fail at all kinds of things all the time and other times get things right in all kind of questions.
Problem is the models have zero idea wether they are right or wrong and always believe they are right. Which makes them useful for anything were either you do not care if the answer is actually right or where somehow it is hard to come up with the right answer but very easy to verify it the answer is right and kind of useless for everything else.
I have something that both Gemini (via GCA) and CoPilot (Claude) analyzed and came up withe the same diagnosis. Each of them made the exact same wrong solution, and when I pointed that out, got further wrong.
I haven't tried Chat GPT on it yet, hoping to do so soon.
I used Cursor and Chat GPT 5 last night for the first time. Before I could even ask Chat GPT 5 about my issue it had scanned the .cpp file in question (because it was open in the editor) and had discovered some possible issues, one of which was the issue in the code. I confirmed that and gave it more description of the error behavior. It identified the problem in the code, and suggested two different CORRECT solutions (one simple, one more complex but "perfect"). I opted for the simple one. It did it. One tiny problem remained, I pointed it out, it fixed it.
This was much better than Gemini or CoPilot on the exact same issue and the exact same commit pointer in my repo. Both of them suggested the same wrong solution and got themselves further and further wrong as they went.
GPT-5 solved the problem - which Gemini failed to solve - then failed 6 times in a row to write the code to fix it.
I then gave ChatGPT-5's problem analysis to Google Gemini and it immediately implemented the correct fix.
The lesson - ChatGPT is good at analysis and code reviews, not so good at coding.