More

rocauc · 2025-08-13T04:57:34 1755061054

Reminds me of NY Cerebro, semantic search across New York City's hundreds of public street cameras: https://nycerebro.vercel.app/ (e.g. search for "scaffolding")

harikb · 2025-08-13T06:38:17 1755067097

What is surprising to me is how low res the public street camera are. Combine that with the glare of car headlights ... :(

silverpiranha · 2025-08-13T06:28:33 1755066513

Ah yeah, this was the winning project at an NVIDIA and Vercel hackathon awhile back

rocauc · 2025-07-14T02:41:53 1752460913

both the endeavor and the site are super cool - congrats on 10 years. interaction on the graphics would be a nice touch to select into a specific run. went looking for the code on your GH! https://github.com/friggeri

rocauc · 2025-03-16T22:07:16 1742162836

In the 2019 fatal Tesla Autopilot crash, the Tesla failed to identify a white tractor trailer crossing the highway: https://www.washingtonpost.com/technology/interactive/2023/t...

ModernMech · 2025-03-17T14:01:31 1742220091

And to be clear, this is the second time a Tesla failed to see a tractor trailer crossing the road and it killed someone by way of decapitation, the first being in 2016:

https://www.theguardian.com/technology/2016/jul/01/tesla-dri...

Notably, Tesla's response to the 2016 accident was to remove RADAR sensors in the Model 3. Had they instead taken this incident seriously and added sensors such as LiDAR, that could detected these obstacles (something that any engineer would have done), the 2019 accident and this "cartoon road" failure would have been avoided. For this reason I believe the 2019 accident should be considered gross professional negligence on the part of Tesla and Elon Musk.

rocauc · 2025-03-16T21:50:11 1742161811

I wonder how long until techniques like Depth Anything (https://depth-anything-v2.github.io/) provide parity with human depth perception. In Mark Rober's tests, I'm not sure even a human would have passed the fog scenario, however.

rocauc · 2025-02-09T21:54:29 1739138069

Meta deeply comprehends the impact of GPT-3 vs ChatGPT. The model is a starting point, and the UX of what you do with the model showcases intelligence. This is especially pronounced in visual models. Telling me SAM2 can "see anything" is neat. Clicking the soccer ball and watching the model track it seamlessly across the video even when occluded is incredible.

rocauc · 2024-10-08T05:12:22 1728364342

A suggestion: I'd swap llava for Florence-2 for your open set text description. Florence-2 seems uniformly more descriptive in its outputs.

Eisenstein · 2024-10-08T21:44:24 1728423864

They are using Ollama which is based on llama.cpp; florence is not supported on that backend.

jerpint · 2024-10-08T08:12:00 1728375120

I found grounding-dino better than Florence and faster

netdur · 2024-10-08T11:36:24 1728387384

I found YOLOS to be faster and better, bot real time but 22k objects under half second

rocauc · on Aug 11, 2024

SAM 2's key contribution is adding time-based segmentation to apply to videos. Even on images alone, the authors note [0] the image-based segmentation benchmark does exceed SAM 1 performance. There have been some weaknesses exposed in areas of SAM 2 vs SAM 1, like potentially medical images [1]. Efficient SAM trades SAM 1 accuracy for ~40x speedup. I suspect we will soon see Efficient SAM 2.

[0] https://x.com/josephofiowa/status/1818087122517311864 [1] https://x.com/bowang87/status/1821021898928443520?s=46&t=9K-...

rocauc · on July 30, 2024

One thing its enabled is automated annotations for segmentation, even on out-of-distribution examples. e.g. in the first 7 months of SAM, users on Roboflow used SAM-powered labeling to label over 13 million images, saving over ~21 years[0] of labeling time. That doesn't include labeling from self hosting autodistill[1] for automated annotation either.

[0] based on comparing avg labeling session time on individual polygon creation vs SAM-powered polygon examples [1] https://github.com/autodistill/autodistill

rocauc · on July 29, 2024

"Load Example" was very helpful to get a sense of what this does. Awesome build. +1 to the other comment wanting a breakdown of what the colors mean.

Also, combining this with real-time in-game camera play could be really powerful, too[0]. Like illuminating details during the game.

[0] https://x.com/skalskip92/status/1816461263829889238

rocauc · on June 23, 2024

i work on roboflow. seeing all the creative ways people use computer vision is motivating for us. let me know (email in bio) if there's things you'd like to be better.