Though I have a question: in order to calculate C you need a way to attribute proposal and ground truth. It's trivial in case when there's only one instance of each class in the image.
But how does it work, when you're working with a set of same-class object? For example detecting each car in traffic.
Suprised to see this here since YOLO has been out for a while now. Shameless plug, I wrote an article on how to use use transfer learning on your custom dataset with the pretrained weights [1]. One of the downside of YOLO is that it uses his own deep learning library darknet. I find that the Tensorflow port dark flow easier to use but it haven't seen a v3 port yet.
There is a pytorch port from Ultralytics (https://github.com/ultralytics/yolov3). Nobody seems to have figured out how to achieve the training performance of darknet though, which is entirely uncommented C. The source is all there, but the loss function changed between v2 and v3, and its not documented in the paper. I think it's been fixed in that pytorch port now though. The only frustrating thing is that every commit in the repo is called update...
Alternatively... you can train in darknet and then run inference in another framework of choice.
Yeah, I don't remember where I read it but it took them a couple weeks to train the model from scratch. I tried training my own weights by scratch it was practically impossible using a Tesla K80. But it's pleasantly surprising how good the transfer learning results are on a custom data set. You can get some "state of the art" results when you train for a couple hours. It's really impressive how he came up with YOLO and wrote his own deep learning library from scratch.
Thank you for the links! I'm going to check both out. I want to see if the PyTorch port works with the new deployment feature from 1.0.
For another example of such honesty, see the paper “HonestNN: An Honest Neural Network ‘Accelerator’”, from the SIGBOVIK 2019 proceedings: http://sigbovik.org/2019/proceedings.pdf#page=107. I love that paper.
Like executives that shun computers as a symbol of their power and status... having a resume that emphasizes my-little-pony suggests his dance card is full (i.e. he has his pick of top job opportunities).
That's hilarious, and definitely makes me want to learn more about what the authors are talking about. What's a good place to refer to the various acronyms this paper uses, for those who aren't familiar with the field?
With that particular example, a citizen could very well also be an armed insurgent. Whether that citizen/insurgent is an ally or neutral or enemy is the distinction worth solving (even if it's significantly harder for an AI).
Of course, that matters far less when Skynet decides that every human is a hostile armed insurgent...
I am assuming that would be solved by having the AI also take in inputs of where your troops and allies are located. Perhaps with something like the Blue Force Tracker [0].
One of the first priorities of an operation is not knowing where your enemy is, but where you are.
And in general, due to its head, it is WAY more readable in PyTorch than in TensorFlow; to the point, I use it as an example in Keras vs PyTorch example https://deepsense.ai/keras-or-pytorch/ (was here at some point).
It still seems to be using only the single frame, without past/present context. E.g. a dog sometimes is recognized as teddy bear for a split second.
Is there any "continuous" models for that? Sounds like a simple bayesian post-processing would do a great deal (e.g. recording the probability of dogs mutating to teddy bears as very low).
yeah it’s easy to fix those with a filter on the predictions. could use bayesian approach or just smooth using a majority vote over rolling window of say 3 frames...
AFAIK, the 'Look Once' part refers to other systems that re-ran a section of the frame at a time through an object detector, resulting in a lot of reprocessing.
You could still look only once, but have that look include multiple sequential frames. Or do something like an LSTM of frames.
For each frame, it returns a list of candidate detections with confidence values if I remember correctly. Should be pretty straightforward to make it smooth using that.
Sounds to good to be true. Also reads like that. :) A gem from this paper:
But maybe a better question is: “What are we going todo with these detectors now that we have them?” A lot ofthe people doing this research are at Google and Facebook.I guess at least we know the technology is in good handsand definitely won’t be used to harvest your personal infor-mation and sell it to.... wait, you’re saying that’s exactlywhat it will be used for?? Oh.Well the other people heavily funding vision research arethe military and they’ve never done anything horrible likekilling lots of people with new technology oh wait..... 1
1 The author is funded by the Office of Naval Research and Google.
YOLO is a combination of backend encoder and head. Backend (Darknet) is unique to YOLO, that would be the main difference with SSD, if I'm not mistaken.
YOLO is a very good and approachable object detection technique. I recently re-read the paper for the original YOLO [1] from 2015 and loved the apparent simplicity of this technique.
Xnor's founding team developed YOLO, a leading open source object detection model used in real world applications. We use a proprietary, high performance, binarized version of YOLO in our models for enterprise customers.
Too good to be true? Seems that they're running YOLO on conventional multi-core CPUs. On ARM even.
This guy gave a talk at my university a few weeks ago. He did some live demonstrations and I was really impressed. With a video camera he did live detection in the room and was classifying dozens of objects. Like the screen was filled with identification boxes. He also did a demo where he used his cell phone. Not as many classifications, but still about a dozen.
Everyone was pretty impressed. I'm always impressed when I see live demos go (almost) flawlessly.
Whats the best route to deploy a python YOLO system to a desktop app? E.g. have .zip file you extract, install, then run - everything is included , tensorflow/keras libs ... no need for user to setup envronment with conda yadda yadda
At the risk of incurring HN's wrath: Docker is an option. Another is to use C/C++ instead of Python and statically link it. Either way, if you want to use the GPU you'll have a world of pain with NVidia stuff.
Darknet is a framework for Neural Networks, YOLO is more an algorithm focused on object detection I think it could be relatively easy to perform the detection of an object's detection.
I cannot get YOLO to detect at 30fps, even on gpu machines. This was true when I tried keras yolo as well as following the instructions for c compilation on this page.
Yolov3 is about a year old and is still state of the art for all meaningful purposes. It's fast and works well. You might get "better" results with a Faster RCNN variant, but it's slow and the difference will likely be imperceptible. Using map50 as pjreddie points out, isn't a great metric for object detection.
I've ignored mask RCNN becuase it's significantly more time consuming to label your data.
The main candidates are all found in Facebook’s Detectron package, but they didn't feel it necessary to document anything in any significant level of detail: https://github.com/facebookresearch/Detectron
Also notable in G-Darknet are some tools useful for training (called darkboard), see https://github.com/generalized-iou/g-darknet/tree/master/dar...