Hacker Newsnew | past | comments | ask | show | jobs | submit | rocauc's commentslogin

Reminds me of NY Cerebro, semantic search across New York City's hundreds of public street cameras: https://nycerebro.vercel.app/ (e.g. search for "scaffolding")

What is surprising to me is how low res the public street camera are. Combine that with the glare of car headlights ... :(

Ah yeah, this was the winning project at an NVIDIA and Vercel hackathon awhile back

both the endeavor and the site are super cool - congrats on 10 years. interaction on the graphics would be a nice touch to select into a specific run. went looking for the code on your GH! https://github.com/friggeri


In the 2019 fatal Tesla Autopilot crash, the Tesla failed to identify a white tractor trailer crossing the highway: https://www.washingtonpost.com/technology/interactive/2023/t...


And to be clear, this is the second time a Tesla failed to see a tractor trailer crossing the road and it killed someone by way of decapitation, the first being in 2016:

https://www.theguardian.com/technology/2016/jul/01/tesla-dri...

Notably, Tesla's response to the 2016 accident was to remove RADAR sensors in the Model 3. Had they instead taken this incident seriously and added sensors such as LiDAR, that could detected these obstacles (something that any engineer would have done), the 2019 accident and this "cartoon road" failure would have been avoided. For this reason I believe the 2019 accident should be considered gross professional negligence on the part of Tesla and Elon Musk.


I wonder how long until techniques like Depth Anything (https://depth-anything-v2.github.io/) provide parity with human depth perception. In Mark Rober's tests, I'm not sure even a human would have passed the fog scenario, however.


Meta deeply comprehends the impact of GPT-3 vs ChatGPT. The model is a starting point, and the UX of what you do with the model showcases intelligence. This is especially pronounced in visual models. Telling me SAM2 can "see anything" is neat. Clicking the soccer ball and watching the model track it seamlessly across the video even when occluded is incredible.


A suggestion: I'd swap llava for Florence-2 for your open set text description. Florence-2 seems uniformly more descriptive in its outputs.


They are using Ollama which is based on llama.cpp; florence is not supported on that backend.


I found grounding-dino better than Florence and faster


I found YOLOS to be faster and better, bot real time but 22k objects under half second


SAM 2's key contribution is adding time-based segmentation to apply to videos. Even on images alone, the authors note [0] the image-based segmentation benchmark does exceed SAM 1 performance. There have been some weaknesses exposed in areas of SAM 2 vs SAM 1, like potentially medical images [1]. Efficient SAM trades SAM 1 accuracy for ~40x speedup. I suspect we will soon see Efficient SAM 2.

[0] https://x.com/josephofiowa/status/1818087122517311864 [1] https://x.com/bowang87/status/1821021898928443520?s=46&t=9K-...


One thing its enabled is automated annotations for segmentation, even on out-of-distribution examples. e.g. in the first 7 months of SAM, users on Roboflow used SAM-powered labeling to label over 13 million images, saving over ~21 years[0] of labeling time. That doesn't include labeling from self hosting autodistill[1] for automated annotation either.

[0] based on comparing avg labeling session time on individual polygon creation vs SAM-powered polygon examples [1] https://github.com/autodistill/autodistill


"Load Example" was very helpful to get a sense of what this does. Awesome build. +1 to the other comment wanting a breakdown of what the colors mean.

Also, combining this with real-time in-game camera play could be really powerful, too[0]. Like illuminating details during the game.

[0] https://x.com/skalskip92/status/1816461263829889238


i work on roboflow. seeing all the creative ways people use computer vision is motivating for us. let me know (email in bio) if there's things you'd like to be better.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: