This is obviously a talent acquisition in more ways than one (the Kaggle team, but also their ability to source machine learning talent). I wonder to what degree it's also a Tensorflow promotion move? It seems like Google is very interested in growing a community around it.
For example: some friends who run a seed-stage biotech deep learning startup were offered a considerable discount by the Google Cloud folks. Their ask? That the company switch to Google Cloud, rewrite some proprietary software in Tensorflow, and heavily publicize both moves.
I wonder if we'll see Kaggle gain a specific bent towards that ecosystem.
Not clear to me why this is a talent acquisition. The Kaggle team (Ben in particular) have some talents in ML, but I'd be surprised if they have anyone there working day to day on ML tasks.
It seems to me more like an old school product-and-media acquisition: Google like the product, and love the audience. This is a good way to get both.
Kaggle can bring out unknown or underprivileged gems into the spotlight. I remember reading an article about a top performer on Kaggle who was a school teacher somewhere in SE Asia (Singapore?).
1) Why do you believe that the hypothetical Singaporean isn't going to apply to Google? Google has no shortage of applicants. And if the applicant believes that Kaggle could help them, whey not simply put the score on the application / resume?
2) If Google is trying to recruit people from Kaggle accounts, why not simply index the accounts?
Neither approach requires purchasing Kaggle at all.
Singapore is a bad example. How about a (hypothetical) guy/gal learning ML from Coursera and living in a remote village in Indonesia? No way to go to college because it's simply too far and he/she has to support their family. The person stumbled upon Kaggle, and started to compete with the best in the world.
Only Kaggle has the full data to be able to make an accurate decision. I don't really think indexing account pages is even remotely enough to find the really talented people among the noise.
I think Google acquired Kaggle for one of the following two reasons: 1) they wanted to expand their talent acquisition reach[1], or 2) they wanted to build a platform like Kaggle aimed at Google Cloud, but figured out that it was just easier to acquire Kaggle itself.
[1]: Google will NEVER be satisfied with its talent pool given their size and rate of expansion. The company is prepared to do a ton -- perhaps even acquiring Kaggle -- to get the best of the best, wherever they are.
I can confirm that Kaggle runs on Azure because I block all Microsoft IPs (to avoid the ninja Windows 10 upgrade) and must disable the blocker in order to go on the site.
As skrebbel said, don't they charge for the upgrade now? That said, Never10[1] was (still is?) a great tool to prevent the Windows 10 auto-upgrade. Also, according to the Never10 page, Microsoft now has an optional update to get rid of the GWX stuff.[2]
What ninja upgrade? You always had to opt-in. Yes, they were really pushing the offer annoyingly hard, but I had no problems whatsoever to keep one of my machines on Windows 7.
Anyway, you can stop doing so now, the time for a free upgrade is over.
This is incorrect. There was an opt-out phase where the Windows 10 install started automatically in the middle of work. I've experienced this myself, there's a moment where Windows 7 just shuts down and starts installing Windows 10 and I had to wait 30 minutes until I could press "I disagree" to the EULA and then it would start rolling back the Windows 10 it just installed.
At this point presumably a system not running Windows 10 is not getting updates anymore. Unless it's an enterprise install, in which case the ninja update is irrelevant.
It's really not a great idea. Either you don't run Windows, and it's not an issue, or you just blocked Windows Update and other important services Microsoft provide that work in tandem to keep your systems safe.
> Either you don't run Windows, and it's not an issue,
Not a solution for those of us who run Windows boxes for various reasons...
And to clarify, I plan on occasionally letting updates through (I'm already on Windows 10) but this is a great way to prevent data collection / backdoor activation, which I hadn't considered. Seems like the simplest way to add a lot of privacy to Windows.
Yet that's not what the parent and its parent were talking about/implying. It clearly said "blocking all Microsoft IPs".
And considering the Windows 10 upgrade was being pushed through Windows Update I'm not sure how you'd want to prevent that specific update by blocking an IP and not interfere with Windows Update as a whole.
Makes sense. Azure LBs do not support ICMP and all ping packets are dropped. You can't ping any Azure-hosted services. Kaggle.com fits the description.
I'm pretty sure it supports ICMP, as TCP/IP cannot work properly without it. I guess you mean ICMP echo. Also there are like four kind of Azure load balancers and this is only true for some of them.
They are also known to have used F#, and even provided a testimonial to this effect: http://fsharp.org/testimonials/. Can't say if it's still used, though. That's two recent high-profile acquisitions (with Jet.com) for F# shops.
> At Kaggle we initially chose F# for our core data analysis algorithms because of its expressiveness. We’ve been so happy with the choice that we’ve found ourselves moving more and more of our application out of C# and into F#. The F# code is consistently shorter, easier to read, easier to refactor, and, because of the strong typing, contains far fewer bugs.
> As our data analysis tools have developed, we’ve seen domain-specific constructs emerge very naturally; as our codebase gets larger, we become more productive.
> The fact that F# targets the CLR was also critical - even though we have a large existing code base in C#, getting started with F# was an easy decision because we knew we could use new modules right away.
None whatsoever, unless they're heavily bought into Azure-specific services.
The idea that if you do C# you must be on Azure (or the other way around) has been outdated since Azure started. The first startup I ran tech at hosted C# on Mono in Docker containers on DigitalOcean and had devs on all 3 major OSes.
I'd be interested if anyone knows anything about this. Especially given the recent updates to for running .NET core on Linux/Mac, a company like Google could make great use of C# without needing to shell out for Windows licenses.
Don't know how true this still holds, but there was a time at least where it sounds like anything outside of C++, JVM languages and Python was off limits.
That's really interesting to hear. I wouldn't read too much into it, I was mostly just speculating. It's quite likely that they mostly scooped them up for the rolodex that is their user database.
I think this may have something to do with Jeremy Howard's time as president there - I remember watching a few of his tutorials a couple of years ago when he was still at Kaggle and he was really into C#.
Likely to avoid their mistake with MapReduce, where by around 2011 candidates were coming in to interviews and saying "MapReduce? That's sorta like Hadoop, right?"
There's value in controlling mindshare; keep everything proprietary too long, and people just use open-source clones that may be inferior but can actually be used by the majority of the talent pool.
EMR beat Google Cloud MapReduce to market, but you're forgetting that before there was such a thing as cloud services, we relied on open-source frameworks and setup our own clusters. EMR is based on an open-source framework called Hadoop, which itself was built on a closed-source Google framework called MapReduce that Google released a paper about. MapReduce came out in 2003, Hadoop in 2006, Amazon EMR in 2009, and Cloud MapReduce in 2015.
...which is sorta my point. People remember the version of the technology that makes it accessible to them, not the first one that comes out. When Google keeps thing proprietary forever and only releases academic papers, people quickly forget just how far ahead they were.
That's all true, but what may matter more to Google was the missed business opportunity of being first to market with a relatively easy distributed computing paradigm.
That's exactly backwards - the MapReduce paper was intentionally released as vaporware to make the rest of the industry spin its gears trying to replicate an imaginary result. And that's why we have Hadoop.
You realize you're arguing with an ex-Googler who has worked on production MapReduces that were first written around 2005 and has read the initial MapReduce commit?
I thought the MR paper described an actual working implementation. It had performance test results, descriptions of issues they encountered and solved, and some sample source code of how MR is used. It seems like a lot of effort was put in for it to be a hoax.
I imagine part of it is that businesses built on Tensorflow play nice with Google Cloud at their TPUs, but mostly I suspect it's just a mindshare thing. If Google becomes the place that all the top data scientists want to work – such that they don't even have to be poached – that's a Very Good Thing for them. It probably doesn't hurt if those data scientists come in already familiar with a tool Google uses internally.
Kind of reminds me of the genius move by Tesla to crowdsource collection of self-driving car information. Experts want to get where they have the data to train their models, and if Tesla propels itself ahead of the pack for number of miles of real-world training data, then that makes them very attractive to talent.
If all machine learning experts use TensorFlow, all the machine learning chips coming out will be highly optimized for TensorFlow. Higher competition among TensorFlow chips = better acquisition prices for Google. They also don't have to go around convincing chip makers to support TensorFlow (like they did, for instance, with the VP8/VP9 codec).
I am curious to see what will happen to Tensor Flow. I hope the code will get clean up... I also hope they will eventually pay somebody to do it, as the open source option clearly generates heterogeneous nightmare.
The rewrite in TensorFlow is somewhat worrying though, since TensorFlow is open source, meaning that there's no real benefit to google if it's written in TensorFlow (except for recruitment purposes).
It's worrying since it suggests that google might be planning to make it, or at least parts of it proprietary in the future....
For the record, I don't think that google will, but I'm still worried about the possibility....
They made angular and they didn't some how proprietary it.
The more worrisome stuff is when they close shop on services or completely change a framework.
TensorFlow isn't a service so we don't need to worry. And I doubt they would change TensorFlow so much like angular 1 to 2 to 3 kinda deal. If it does happen Keras library abstract it iirc.
I think their goals is to get people to use their cloud services imo. They do the same with their nexus without the SD card to push people to the cloud.
Also I think it's almost like the idea of controlling a framework instead of being on the whim of some other company. I'm looking at Oracle and Java here.
Facebook have their NN. Google have their owns. So they don't have politics to deal with.
Recruitment is an important purpose, though. Having a steady supply of pre-Tensorflow-trained engineers available is presumably why they opened up Tensorflow to begin with. They're not going to benefit more than that anytime soon by closing it off again.
I think their play has been building specialized hardware that executes TensorFlow better than anyone else. "You could use a GPU to do this, but check out our custom ASIC that does it 400x faster for 1/5th the cost..."
The 400x was clearly hyperbolic – and sure. But you probably don't have one-click integration with an existing battle-tested IaaS platform.
Vertical integration is powerful, and by open-sourcing Tensorflow Google is achieving useful synergies in sales and recruiting. At their vast scale, even small ROIs (as a percentage) can be massive.
For example: some friends who run a seed-stage biotech deep learning startup were offered a considerable discount by the Google Cloud folks. Their ask? That the company switch to Google Cloud, rewrite some proprietary software in Tensorflow, and heavily publicize both moves.
I wonder if we'll see Kaggle gain a specific bent towards that ecosystem.