As someone in CV/ML with a background in cognitive science, and partial to the work of Lakoff and others in regards to the role of embodiment, it does irk me how loose papers in vision can be with the term. In this case specifically, it's rather hard to justify. I'm guessing it may be influenced by other papers in vision that use agents in a simulated 3D environment. In those papers saying that there is some kind of embodiment is iffy, but depending on the quality of the simulation, maybe defensible.