I'd check out the OpenCV documentation and examples. This is basically what I us...

matsemann · 2025-08-05T09:17:53 1754385473

Thanks. Also just kinda wondering if there's been any leaps lately, as I guess this is the same way as one would have done it a few years ago as well. But now that one can upload images and chat about them to multi modal LLMs, wondering if there's easier ways now (but preferable not uploading a million images to chatgpt api and paying the cost).

Like, could I avoid training or specifying much or becoming very knowledgeable in this domain, are we there yet?

Could I say "detect the frames of every car when it passes position X in the video, and then grab the frame when the same car passes position Y", and then I could calculate the frame difference to know the speeds. Or would I have to do loads of code and training still for something like this?

(I know I'm asking for much here, just curious what the SOTA is in this right now)