Reminds me of NY Cerebro, semantic search across New York City's hundreds of public street cameras: https://nycerebro.vercel.app/ (e.g. search for "scaffolding")
both the endeavor and the site are super cool - congrats on 10 years. interaction on the graphics would be a nice touch to select into a specific run. went looking for the code on your GH! https://github.com/friggeri
And to be clear, this is the second time a Tesla failed to see a tractor trailer crossing the road and it killed someone by way of decapitation, the first being in 2016:
Notably, Tesla's response to the 2016 accident was to remove RADAR sensors in the Model 3. Had they instead taken this incident seriously and added sensors such as LiDAR, that could detected these obstacles (something that any engineer would have done), the 2019 accident and this "cartoon road" failure would have been avoided. For this reason I believe the 2019 accident should be considered gross professional negligence on the part of Tesla and Elon Musk.
I wonder how long until techniques like Depth Anything (https://depth-anything-v2.github.io/) provide parity with human depth perception. In Mark Rober's tests, I'm not sure even a human would have passed the fog scenario, however.
Meta deeply comprehends the impact of GPT-3 vs ChatGPT. The model is a starting point, and the UX of what you do with the model showcases intelligence. This is especially pronounced in visual models. Telling me SAM2 can "see anything" is neat. Clicking the soccer ball and watching the model track it seamlessly across the video even when occluded is incredible.
SAM 2's key contribution is adding time-based segmentation to apply to videos. Even on images alone, the authors note [0] the image-based segmentation benchmark does exceed SAM 1 performance. There have been some weaknesses exposed in areas of SAM 2 vs SAM 1, like potentially medical images [1]. Efficient SAM trades SAM 1 accuracy for ~40x speedup. I suspect we will soon see Efficient SAM 2.
One thing its enabled is automated annotations for segmentation, even on out-of-distribution examples. e.g. in the first 7 months of SAM, users on Roboflow used SAM-powered labeling to label over 13 million images, saving over ~21 years[0] of labeling time. That doesn't include labeling from self hosting autodistill[1] for automated annotation either.
i work on roboflow. seeing all the creative ways people use computer vision is motivating for us. let me know (email in bio) if there's things you'd like to be better.
reply