First: the image recognition model is unlikely to have seen very many depth maps. Seeing one alongside a photo probably won't help it recognize the image any better.
Second: even if the model knew what to do with a depth map, there's no reason to suspect that it'd help in this application. The lack of accuracy in a image-to-calorie-count app doesn't come from problems which a depth map can answer like "is this plate sitting on the table or raised above it"; they come from problems which can't be answered visually like "is this a glass of whole milk or non-fat" or "are these vegetables glossy because they're damp or because they're covered in butter".
First: the image recognition model is unlikely to have seen very many depth maps. Seeing one alongside a photo probably won't help it recognize the image any better.
Second: even if the model knew what to do with a depth map, there's no reason to suspect that it'd help in this application. The lack of accuracy in a image-to-calorie-count app doesn't come from problems which a depth map can answer like "is this plate sitting on the table or raised above it"; they come from problems which can't be answered visually like "is this a glass of whole milk or non-fat" or "are these vegetables glossy because they're damp or because they're covered in butter".