Yah, pretty sure it is the same feature that's been in Bing Chat for 2 months now. Which feels really like there's only one pass of feature extraction from the image, preventing any detailed analysis beyond a course "what do you see". (Follow-up questions of things it likely didn't parse are highly hallucinated).
This is why they can't extract the seat post information directly from the bike when the user asks. There's no "going back and looking at the image".
Edit: nope, it's a better image analyzer than Bing
This is why they can't extract the seat post information directly from the bike when the user asks. There's no "going back and looking at the image".
Edit: nope, it's a better image analyzer than Bing