Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Motion planning is also an Open Problem for Humans!

Let's say we are standing on a high hill and I point to another hill and say: "Walk over there". Do you expect any human to find a reasonably good path by themselves? I would personally try to use a map. How do military robots solve this? They use satellite images.

And in general, this article seems very pessimistic to me. My home-built computer vision pipeline can do localization and mapping, loop detection, object segmentation and depth estimation at levels that are "good enough" for indoor drone flight. So I would assume that someone with a generous serving of financial resources would be able to solve most of those problems, except for 2 issues:

1. You need to memorize how the environment works. That's why newborn kids take stupid decisions, they lack the stochastic priors. Lucky for us, memorizing is AIs strong point.

2. You need to have mechanics that are more forgiving. If I accidentally position my hand the wrong way, I might accidentally squeeze someone a bit, but I won't crush their bones, because the mechanics of my arm are flexible. We'll need better accentuators and elastic casings.

And just for the sake of discussion, here's my replies for each problem category.

Simultaneous Location and Mapping: There are libraries that work well enough for your robot to localize itself in a building-sized environment with just a single camera. I'd consider this solved. https://github.com/raulmur/ORB_SLAM2

As for the obstacles, also for humans it is mostly guessing if you want to step on that blanket or if there'll be something fragile or slippery inside.

Lost Robot Problem: Most SLAM solutions are good enough that you could just regenerate the map from scratch every time that there was a gap in your perception. ORB-SLAM2 also has a loop and merge detection module so that if you reset its tracking and then it walks into a known environment, it can merge the old data into its new state.

Depth estimation: It works well enough in practice. https://www.stereolabs.com/zed-2/

Scene understanding: I don't know about you, but when I drive the highway, I sometimes have dead flies on my windshield. Apparently, they aren't that clever after all.

Position estimation: It works exceptionally well for VR markers. In general, those solutions tend to be called "Visual Odometry" https://www.youtube.com/watch?v=fh5dLF3dmr0

Affordance discovery: This is mostly a memorization problem, so a perfect candidate for AI.



Alas, your problem #1 is not really about memorization, it is about understanding. Take a human to the house they have never been before and tell them: "make me some tea". Now try that with any robot you want.

It is you being optimistic, not the article being optimistic.


Give a human kid a piece of paper and an envelope and say "make it fit" and roughly 1 out of 5 will fail because they have not yet had the opportunity to watch their parents fold a letter.

For your example, I see an upfront memorization component, which is that your request only works with humans that have previously seen how tea is made. That would be an unsuperwised AI which watches a youtube tutorial and then reduces the task to "get hot water + get tea bag + get container + combine"

Please note how by cultural memorization is again implicitly added. I might use a trash can as the container, but due to our shared culture we'll agree that a mug works best. So this gets reduced with more unsupervised AI to resolve "container" to a list of tolerable objects.

Next comes an exploration phase where human and robot just randomly open cupboards to see what is inside. YOLO should be good enough to recognize the water cooker, the tea bags, and the mug.

Next up comes again memorization. Kids cannot reliably turn on a machine that they have not seen before, so intelligence is probably of little use. Instead, they learn by imitating. An AI would probably again crawl random YouTube videos, check that the cooker looks similarly, then try to imitate that.

I hope I have illustrated that a lot of what we think of as understanding is not much more than repeating a similar situation which we have previously experienced.

That would also be my theory as to why meditating and thinking about an action can actually improve our skill at doing it. We're memorizing a fantasy simulation.


> Depth estimation: It works well enough in practice. https://www.stereolabs.com/zed-2/

I have no experience with the zed 2, but the zed 1 did not convince me and it had similar fancy marketing visualization as featured by the page you linked.

Don't trust anyone who shows you

- depth estimations from the viewpoint of the camera

- color coded depth estimation

In neither case can you judge the accuracy

Vision-based depth estimation is hard for the general case, where you need to cover a non-tiny depth range to cover, work indoors and outdoors and cannot rely on great texture. All vision-based methods I know of break under circumstances that are not too uncommon for robots.

Laser works much better, but the information density is not that great.

Radar works well in some applications, badly in others.

So I'd consider it "solved" when you throw a very costly combination of sensors on the problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: