Canny edge detection is optimal in some settings. Often, though, edge detection is in service of another goal, such as segmentation. Nowadays, computers are big/fast enough (GPUs) that building models that learn edge detection implicitly and do segmentation or the end task directly is feasible.
Still fun to learn about Hough transforms, canny edge detection, and all that 1990s computer vision though!
> Nowadays, computers are big/fast enough (GPUs) that building models that learn edge detection implicitly and do segmentation or the end task directly is feasible.
Wouldn't re-using Meta's SAM (Segment Anything Model) be sufficient here? (which is freely available and, I think, free to use?) Or do you think you need to build your own model specifically for detection bills/sheets of paper?
Yeah, where licensing allows it, reusing existing model weights (possibly continuing to train on your specific task) is reasonable. I was just pointing out that these methods aren’t SOTA anymore.
To be fair, I have no idea how I could use modern techniques instead of the Hough Transform. One use case is the speed and RPM car dials recognition. Using the Hough Transform, it's trivial to reliably detect the slope with a high degree of accuracy.
Have vision models started offering better alternatives for such use cases? It's a genuine question; it's been a while since I last looked.
You can pretty much solve this using modern DL models. There are options depending on how accurate you want your model and how much compute you have.
There is an entire spectrum of models, from something like Mask-RCNN, U-Net family upto something like Meta's SAM, which you can use without even training.
In this kind of cases I'd probably use the Hough-based algo as ground truth to see if you can indeed fine-tune a DNN on that regression task. If it does with reasonable accuracy, then you have a baseline that could be improved in multiple ways to surpass the original.
That said, there are not that many shapes of speedometer and wheels, and the view point is likely controlled, so your old school method is probably the better way ;)
For the purpose of learning, would you recommend some tutorials, articles or videos that help achieve that? Accuracy aside, this would make a great learning experience!
Is it better to look in the PyTorch community, or that's where some Tensorflow approaches shine? (CUDA is ok)
PyTorch is much nicer to play with in my opinion. Maybe start with their official tutorial, I've also heard good things about Karpathy's YouTube channel from beginners.
What if these filters are explicitly used as preprocessing steps when training a segmentation model? Would that at least save some epochs, if not increase accuracy?
I'm suspecting it could be similar to the learned vs. predefined positional embeddings in GPTs. That is, the learned version is a "warped and distorted" version of the exact predefined pattern, and yet somehow it performs a bit better, and no one knows exactly why.
> Still fun to learn about Hough transforms, canny edge detection, and all that 1990s computer vision though!
In my domain of metrology, this part isn't acceptable:
> you will get a model that can predict the exact edges of every input image.
Prediction is a nice way to limit the space I have to consider for a proper edge detection, but usually it's better to just go straight to something more deterministic.
The accuracy is probably quite bad, when looked at individual manipulations, but the overall result is jaw dropping - even, if it's only a demonstration
No really related, but I personally dislike GaussianBlur as the default noise removal method in all CV articles. In a lot of cases is more useful to filter by color (like BLACK letters in a white paper) otherwise bilateral filter (also available our of the box in OpenCV) usually works better especially if you want to do edge-detection (gaussian also blur edges). Yeah, GaussianBlur may be better performance-wise, but that is something you should consider if you real-time app is short of cpu-power.
> As for our case, we need to apply edge detection in order to find the edges of documents in order for our Document Scanner SDK to detect the document in real time and scan it.
I've seen some kind of edge detection also used in software used in automated KYC/AML pipelines, to detect obvious cut/paste in utility bills / passport scans etc.
Forgot the name of the (commercial) software but it's a thing.
It's of course possible to fool these but a great many dumb copy/paste which do look perfect to a human will actually become obvious after applying simple edge detection.
FWIW Gimp has Sobel edge detections for sure (don't know about the others), so you can "play" with edge detection if you're on Linux.
Are there good / standard pre-trained models that exist for doing edge detection? Surely the only good answer isn't "train one yourself from scratch" or "use the Docutain SDK" -- what good pre-trained models have people used for this task?
I can't say for sure now, but one time I was helping with a similar project for bank agents working "in the field." The core requirement was that data was strictly confidential and could not be processed anywhere but where the bank approved. These were scanned documents of a credit form application. The value added by this software is that internal processes could start right away after sending pictures. Using this application, the bank assumed the chain of trust was maintained. Without it, the credit application waited for manual form verification.
EDIT: I remembered one anecdote. At some point, we had poor accuracy with the white paper on a white background. I really liked how it was solved: agents were equipped with a branded black silk cloth to use as a background.
Still fun to learn about Hough transforms, canny edge detection, and all that 1990s computer vision though!