I agree. There are some corner cases that GPT-4o reliably fails that Claude does well in, and vice versa. GPT-4 and GPT-4o consistently generates very poor cv2 Python code for human face/boundary box work - it's a strange reproducible failure in my experience.