Ha didn't expect this to end up here. If anyone is interested, I'm working on a blogpost explaining how we built the app in detail… It uses embedded TensorFlow on device (better availability, better speed, better privacy, $0 cost), with a custom neural net inspired by last month's MobileNets paper, built & trained with Keras. It was loads of fun to build :)
How large is the training set / how did you get a sufficient amount of hot dog pics for a custom model, especially since you did not have a class of naive Stanford CS students at your disposal?
About 150k total images, 3k of which were hotdogs. The results are far from perfect (there's ton of subtle — or hilarious — ways to trick the app) but it was better than using a pre-trained model or doing transfer learning, accuracy-wise (honestly it was even better than using Cloud APIs). As for the difficulty in preparing the training set, I'll just say I definitely empathize with Dinesh and Jian Yang’s feelings in episode 4 :D
Can you please elaborate a bit more on the model architecture and what you tried with respect to transfer learning?
Did you use an imagenet architecture e.g. VGG and retrain from scratch or a custom architecture? Did you try chop off the last 1/2/3 layers of a prerrained mode and fine-tune?
Bonus points:
1. How much better were your results trained from scratch vs fine-tuned?
2. How long did it take to train your model and on what hardware?
Hey so I actually tried Vgg, Inception and SqueezeNet, out of the box, chopped and trained from scratch (SqueezeNet only for the latter due to resource constraints).
We ended up with a custom architecture trained from scratch due to runtime constraints more so than accuracy reasons (the inference runs on phones, so we have to be efficient with CPU + memory), but that model also ended up being the most accurate model we could build in the time we had. (With more time/resources I have no doubt I could have achieved better accuracy with a heavier model!)
Training the final model took about 80 hours on a single Nvidia GTX 980 Ti (the best thing I could hook to my MacBook Pro at the time). That's for 240 epochs (150k images in an epoch) ran in 3 rate annealing phases, each phase being a handful of CLR (cyclical learning rate) phases.
I'll answer in more detail in the full blogpost, it's a bit complicated to explain in a comment. I'll have charts & figures for y'all :)
As someone currently writing an app which uses a retrained Inception model, I watched the show pointing and laughing (and then crying) at the same issues and frustrations. The accuracy of the show and especially this episode has been just brilliant.
Thanks for sharing all the tech details too, it's been great to read. I'm even more amazed to see it as a real app, that I didn't expect!
It was random, I was already working on the show as what Hollywood calls a (technical) “consultant”: advising on storylines, dialogue, background assets, etc. When this idea popped up, someone suggested we build the app for real. We took a try and ended up building the entire thing in-house with the crew, as opposed to hiring an external agency to do it for us.
I should say a lot of people made the app possible, including the show's awesome producers, writers, designers, and a lot of kind folks at HBO. To answer your question, I was the only dev on the project, and I've been working on it since last Summer, on a very part-time basis (some nights and weekends). A lot of time was spent learning Deep Learning to be honest. The last revision of the neural net was designed & trained in less than a month of nights/weekend work but obviously couldn't have been achieved without the preceding months of work — but if I was starting today knowing what I know now yeah it'd probably be about a month of work. The React Native shell around the neural net was just a few weekends worth of work — mostly it was about finding the right extensions, tuning a few things for rendering/performance, and like a whole weekend dealing with the UX around iOS permissions to access the camera & photos (lol it's seriously so complicated).
In the early 90's, I had the honor of corresponding with "Uncle Frank" Webster [1], the curator of the Hot Dog Hall of Fame [2]! It boasts of more than 2,500 frankfurter items, including the Lamborweenie, a dog on wheels that "Uncle Frank" hopes some day to race against the Oscar Meyer Weinermobile [3]. Unfortunately the hot dog museum is currently closed and in (hopefully refrigerated) storage [4], otherwise the museum, gallery and gift shop would be a great place to train your app.
He asked for permission to publish in his newsletter a gif [5] of a photo I'd taken and put on my web site of the Doggie Diner head [6] on Sloat Boulevard in San Francisco [7]. (This was years before John Law acquired all the Doggie Diner heads he could find for his restoration project [8], so there weren't so many photos on them on the internet at the time.)
Of course I gave him permission because he asked so politely, and although at first he seemed a little creepy, I could tell he was authentic and sincere since he signed his correspondence: "With Relish, Uncle Frank." [9] And he even delivered on his promise to send me copies of his newsletter!
He enthusiastically informed me that the highest quality hot dogs he's ever found are from Top Dog in Berkeley [10]. He admitted that he went through their dumpster to find out where they sourced them from because they wouldn't tell him, and he vouches that they are made from the finest possible ingredients. I agree with Uncle Frank that Top Dog has really excellent hot dogs (ask for them cooked butterfly style), and I take him at his word that they come from a reputable source. You can now actually order them online [11] if you would prefer not to go through their dumpster.
I honestly thought the app itself would come across as too limited — and I wasn't quite sure how HN felt about the show it's attached to. I was preparing that technical blogpost specifically for HN because I thought that would be a more hacker-centric way of looking at the same thing.
I was REALLY depressed after watching these last two episodes. HungryBot was easily the coolest thing I've done in my unfunded nonprofit. But it was too buggy to fly. Kicked it out of the nest too early, but to the best of my knowledge it was the first deep learning food recognition app.
Looking for philanthropists if anyone wants to help me study diabetes with apps like this.
It's actually written in React Native with a fair bit of C++ (TensorFlow), and some Objective-C++ to glue the two. One cool thing we added on top of React Native was a hack to let us inject new versions of our deep learning model on the fly without going through an App Store review. If you thought injecting JavaScript to change the behavior of your app was cool, you need to try injecting neural nets, it's quite a feeling :D
This looks great, haha! Shame I can't access it from the UK.
It would be really interesting to read more about your thoughts on working with RN and C++ and perhaps how you did some of it. I'm currently doing the same (but with a C++ audio engine rather than image processing stuff) and I think it's an incredibly powerful combination - but I do feel like I'm making up some interop patterns as I go and there might be better ways, so would love to hear how other people use it!
Broadly, I've created a "repository" singleton that stores a reference to both the React Native module instance (which gets set when I call its setup method from JS) and the C++ main class instance (which gets set when it starts up), so they can get a handle on each other (I bet there are better ways to do this, but I'm new to C++/ObjC and couldn't work out a better way to get a reference to the RN module).
I'm then using RCT_EXPORT_METHOD to provide hooks for JS to call into C++ via an ObjC bridge (in an RCT_EXPORT_MODULE class), and using the event emitter to communicate back to JS (so the C++ can get the RN module instance from the singleton and call methods which emit events).
I've not done anything that really pushes the bridge performance to a point where I've seen any noticeable latency/slow down caused by the interop - have you had any issues here?
Like I say, I'm finding a really cool way to build apps that need the power of native code but still with the ease of RN as the GUI and some logic, and I actually quite like the separation it enforces with the communication boundary.
Sounds like you're further ahead than I was with the React Native part! Not Hotdog is very simple so I just wrote a simple Native module around my TensorFlow code and let the chips fall where they may performance-wise. The snap/analyze/display sequence is slow enough that I don't need to worry about fps or anything like that. As much as I enjoyed using RN for this app, I would probably move to native code if I needed to be able to tune performance.
Can you explain to a noob how you wrote the Native module around TensorFlow? My main area of focus is in python, but I feel hindered when I think I'm ready to start developing for mobile apps. I'm looking into RN, but still not sure how that plays with TF and other python modules.
It was honestly just maybe 10 lines of code, but I was very confused about it before I got it done. The message passing is a bit counterintuitive at first. I'll try to share example code in my blogpost!
Hey guys, how would I be able to add characters to an account for injustice and then disable the anti-cheat. I want to basically make it so that i cant get banned for using characters I added to my account.
When using React Native (and also storing your network in JS) you can choose to push updates and fixes to JS part of your code without going through store again.
Yup that's basically it. The hack was just in getting Tensorflow to accept/load its neural network definition from the JS bundle (what CodePush distributes for you) rather than from the main Cocoa bundle.
Just as a note, people/developers have had messages from Apple telling them that they need to remove code which allows them to update their app outside of the app update/review process.
Really cool stuff! I didn't even realise you've switched jobs, but that explains why I didn't see you when we popped in at Townsend the other day! Keep up the good work.