I've been building Rails, Python, Node and, you name it, frontend JS web app development for the last 12 years. I think I've gotten too bored with the same challenges that app development presents. Has anyone made the official career pivot to the ML/AI field? What did you have to do? Did you have to start from a lower-level entry into the field?
I just made the transition in the other direction. I did a Masters in computer vision, then worked at a startup doing computer vision and machine learning work for 2 years. I recently transitioned into app dev.
There are 2 levels to ML/AI, being a researcher and being an engineer. The researcher actually creates new models, architectures, etc. You're going to need to be talented at math, as well as pursue a PhD to have enough time to absorb some subset of the material to have a good understanding. (A masters was good but not enough time for me personally).
Then there is engineering which is leveraging the creations of the very smart PhDs. At least in my experience, the shallow level is basically fine-tuning models to your use case, which does require an understanding of some things like loss functions, train/validation/test sets, but it's not too complicated.
Everyone that asks me how to learn machine learning, I advise them to read Hands on Machine Learning by Aurélien Géron cover to cover. When I first started my masters I did this and it helped immensely because it was easy to understand, was broad, and was interested usually from an application perspective.
From there, I would suggest learning PyTorch (starting w/ Keras is ok too, but don't stay there too long, and avoid Tensorflow), as it's much easier to develop with. I always learn best with a personal project, so maybe see if there is a real life "problem" you'd be interested in solving, like classifying different pets from each other or something like that.
It'll take a while to build up your skills, so going to school is of course an option, but with dedication I think you can also accomplish this solely with side projects and learning on your own. Best of luck!
1) Did your Masters cover non-deep-learning vision (classical vision?) in sufficient detail? There is a ton of math in there. Going from being a shallow user of OpenCV to a deep one seems a big jump. I'm not sure a Masters focused solely on classical vision would get someone there (let alone one covering other things like ML, DL, etc.).
2) Did you end up training large models from scratch or is it all just fine-tuning? I am trying to do the former and I realize getting things to scale for from-scratch training is a whole other topic. I suspect getting things ready for inference would be similar.
1) I didn't want to be someone who only learned machine learning/deep learning solely because it was the sexy thing at the time (still is). I took a class that covered classical computer vision, as well as signal processing classes. So while I wouldn't say I'm an expert, I'm familiar with traditional approaches (enough to be dangerous). I would agree a Masters is not enough to become a deep expert. Unfortunately, I do think you just have to do a PhD to give yourself enough time to explore and absorb all of the complexities of this space.
2) I did a thesis for my Masters program, using generative adversarial networks (GANs) for image compression. It was by no means novel, or a breakthrough, but what I did learn (and this is so obvious it's painful to write this) is that you should pretty much never train from scratch and that you should always use transfer learning. As far as what I did at my last company, it was basically taking state of the art models from the MMDetection python package, fine tuning them to our use case, and then deploying them. So I wasn't really doing anything from scratch.
Happy to chat more about your specific use case if you're interested! You can email me at zbellay at gmail dot com.
From my experience, there are plenty of teams in FANG that will hire you as a backend developer in a ML team assuming you can pass their interviews. 90% of the work in these teams is not core ML and is more mundane work supporting these models, such as data piping, cleaning, feature generation, experimentation, and real-time serving. You'll get plenty of experience in working directly with ML systems.
The jump to core ML is a bit trickier. Competing with people with PhD's is a drag. Wish people could also give me some tips there.
FWIW I worked at Amazon in one of these teams and it's not something I found very interesting. You get thrown a random binary that some ML researcher compiled with instructions to host-it. No sense of ownership over the product, you're just an Ops frontend for a researcher so they can do the fun stuff building models and you're dealing with the pagers.
It's about the same tier as BI dashboards. Huge breadths of data you don't produce but are nevertheless accountable for. All your stakeholders can easily stream-of-consciousness rattle off dozens of new metric/feature ideas or questions about the feature/metric values they're seeing over the course of 45 seconds, each one taking you tens of hours eyeballs deep in SQL to even begin to answer.
As a sort-of counterpoint I work at a non-amazon FANG where I am much more involved in model training and evaluation. I still have never needed to define a model myself, but it's much more complicated and far more ownership than someone throwing me a binary. I think that speaks much more to what's going on in Amazon *grins*.
ML guys build the fun stuff, go to conferences, etc. We maintain their stuff and work the 60 hour weeks answering pages.
It's mind numbing. Arguably some of the most boring work I've ever done especially knowing that you're just a CI/CD robot. Nothing has motivated me towards looking into starting a business more than watching other people have fun and you cleaning up their messes.
>90% of the work in these teams is not core ML and is more mundane work supporting these models, such as data piping, cleaning, feature generation, experimentation, and real-time serving. You'll get plenty of experience in working directly with ML systems.
MLOps is what you're describing, and it's probably the number one field I'd recommend someone to go down right now as a backend dev.
To be fair, it's not as funny as automating data cleaning, on the principle that data scientists don't want to do it.
And yeah, lots of people dislike it, but you can't build models without an understanding of the data, so even if automated data cleaning became possible (unlikely) you'd still need to spend a load of time doing work on the dataset before building anything useful.
Some people seem to think they will be able to type "clean my data" into ChatGPT or similar and get a beautiful clean dataset. They are probably descendents of the people who said "COBOL means we don't need programmers any more".
Data cleaning requires a lot of judgement and domain knowledge. Imagine if an AI did clean your dataset. Are you just going to trust it (Hell no!)? Or are you going to spend ages trying to work out what it did, which doesn't seem much of an improvement.
I write data cleaning/ETL software and I'm confident that the need for my product is going to going up between now and when I retire.
because 90% of the industry work is MLOps
the pipeline usually goes
1. make a POC inside a Jupyter Notebook with some scrappy, data and off-the-shelf model,
define metrics and train a baseline to see if the whole ML endeavour might even be worth it
2. Do error analysis, find better data, tune parameters, re-train to see if you can improve upon the baseline
3. Make the first deployment, setup data collection
4. Automate 2 as much as possible because data is ever changing and you want to try many more off-the-shelf models
5. Deploy new models and collect ever more feedback
4 and 5 are basically a while loop that never ends and that's mostly MLOps
It still requires proper ML expertise, especially when things break tho
we've got 4 and 5 pretty automated... the real issue is (as you likely are alluding to) as #1/2/3 draw in new completely infeasible data to get at scale, and then wants you to re-train daily, 2x day, every 4 hours, continuously. Oh and your costs go through the roof and likely aren't worth the returns anymore chasing that .001%
Can't speak for FAANG, but you can do well at some companies with "ML Developer" or even "Data Engineer" if you have database admin experience and/or are willing to spend time studying up.
Just wanted to add: DL Engineer != MLOps engineer. DL Engineers who deal with scientists have a much harder job IMHO as the scientists ask for 10 different things, each taking a lot of effort to implement. The current DL stacks (e.g. Tensorflow/keras) make it very easy to implement the standard stuff. But if you want to go off the path, it is really very hard (I wonder if Pytorch makes that significantly easier if anyone cares to chime in).
Good point in general about working with "scientist code". You definitely will want to at least understand some of the math in order to be effective at implementing it.
I don't think Torch is any easier than Keras if you don't know what you're doing.
I think data engineering is probably the easiest transition for software devs with backend web (read: database and Python) experience. Free learning resources abound nowadays, as well as decent-quality books and online courses. It's largely the same set of engineering "macro" skills (proejct planning, cost-time-correctness tradeoffs, levels of abstraction, deploying stuff to the cloud), and the specific engineering "micro" skills (database performance tuning, indexing, storage and compute cost management, setting up ETL pipelines, integrating with dashboard systems like Tableau) should be well within the capabilities of a mid-senior backend web developer.
The hardest part IMO would be getting up to speed to the stats and data analysis basics enough to be conversant with the analysts and data scientists who would be your primary stakeholders and users. But that too can be done with help from the multitude of free learning resources and chatrooms/forums.
IMO it's worth the effort if you're looking to transition. There is a shortage of good, conscientious data engineers right now. Hiring pipelines are clogged with aspiring junior data scientists who are willing to put up with data engineering for a couple of years to try and get some experience on their resume. Data engineers' skillsets are highly portable across companies and industries, and will remain in demand for many years to come, even as companies find that they can't justify hiring a data scientist, so pay and hiring potential will be accordingly high. And as a data engineer you will quickly establish yourself as an essential team member in all but the most dysfunctional organizations, and your value will be immediately apparent whenever someone wants to build a dashboard or pull a report and it all "just works".
The only thing to beware of is the usual caveat of companies trying to short-sightedly hire inexperienced juniors in mid-senior roles to save a buck, setting both you and them up for failure. But that's a caveat in any field where you're starting from the bottom. If you're already an experienced software engineer, you should hopefully be able to either see through that bullshit, or to be able to handle it somewhat more gracefully than a true junior could.
As long as you're interacting with it at a decently high level it's honestly really easy code.
I watched a friend of mine who went from maybe some python courses to writing some really impressive ML stuff as her first project within a few months and with some help from some people who know their stuff a bit, which I found pretty impressive. I think as long as you're building things on top of what's available out there that you can find tons of utility in all the solutions that have been coming up in the past years without a ton of effort. Try dipping your toes into something simple like object recognition and you'll find it's pretty easy.
If you're talking about getting into the field on a level where you're actually developing these technologies themselves then I hope your math is around college-level. Reading deeper into the docs of the tools I'm using and they're showing calculus and linear algebra to me. I don't pretend to understand it very well.
The 'field' is rather big, and as with most tech fields, 80% of the work is the same (sometimes mundane) stuff. Infrastructure, CRUD, business logic, the whole 12-factor thing.
Where it would start to get tricky is if you have to do more than 'consume' ML libraries. Everyone can learn how to use a library or API, and getting some training going isn't all that hard either. But if you have to build said library, or come up with a new modelling method, that's where it's a real transition and gets really hard to simply 'switch'. It's also one of those areas where a PhD really helps, not from a "certification-as-entrypass" perspective, but because this gets down to hard science. For most companies, however, that's a point they never reach.
Quit my job in Fintech to work on my own AI startup.
I built https://FakeYou.com as a side project, and it blew up. I quit my job after I realized the potential, added monetization, and started to broaden what we do.
I've been working on https://storyteller.ai for a year and plan to launch our platform soon.
Both of these tool sets reinforce one another.
I'm hiring folks that were engineers that want to do AI instead. Please reach out! Our stack is Rust / Unreal / Pytorch / k8s.
"blew up" can be positive or negative depending on context. Generally when talking about a physical object it's negative, when talking about a personal endeavor it's generally positive.
"My server blew up" is likely negative.
"My youtube channel blew up" is likely positive, it means it went viral.
If a business blew up, it's almost certainly positive.
> Has anyone made the official career pivot to the ML/AI field?
If you're talking about using ML/AI related tools and algorythms and taking advantage of what they can do, maybe for data processing etc then that's not really too hard an ask, infact these days depending on your role there could probably be a natural progression into these areas.
The problem comes from the core of these types of work, so creating the algorithms, building new model, processing the raw data into something that is useful and this involves even being really close to the hardware level too. I find that it's hugely academic and mathematical focused, for obvious reasons.
It's certainly stuff that flies over my head and for me personally no matter how interested I am in it, I don't think it will ever 'click' for me.
So I did a lot of ML work in grad school and have also built 4 iOS apps and two web apps. I can tell you building statistical models is not a very interesting grind, it’s mostly cleaning data and fiddling parameters. It’s also unlikely anyone would hire non-stat/math degreed person to do that stuff. On the other hand you’re prob goood at turning models into a. Usable product. Here’s a recent library to do that in NODE
https://github.com/transitive-bullshit/scikit-learn-ts
Not exactly AI/ML, but I had some years of experience in Android, then RoR and engineering management. Then got an MS in Bioinformatics and now do PhD in Medicine. Which is really data analysis of sequencing data. I use some “AI” models as well. I’m doing a PhD at a company, away from the academic institution where I’m formally enrolled. So it’s kind of like a regular job.
But I’ve also seen colleagues pivot into data engineering. They’ve done it within the same company by simply asking, I guess? When there’s a role available and you do your homework there’s a chance to change the field.
There’re some collaborations between companies and academics. EU runs several programs where a company can get a grant to hire a PhD student. This is supposed to advance the research in member states and give local companies an edge. These industrial PhDs are less common, but they exist.
I've been doing "full stack" for many years. Until the new GPT came out I thought that I needed to become an expert in ML and had taken some Coursera classes etc.
But now with the general purpose power of the ChatGPT API / OpenAI Embeddings, things like Stable Diffusion, and Eleven Labs, etc., and the expectation of new models coming out that have visual understanding integrated with the large language model, and quite possibly even more intelligence, I don't feel that ML is a good path for me. It makes more sense for me to just leverage the APIs to build applications.
I get the impression that optimized (multimodal) transformer models are going to be readily adaptable to most tasks and so its much less important going forward to do "real research" in order to get results.
As soon as the GPT3 API came out I started experimenting and moving towards launching https://aidev.codes. So now I have quite a bit of experience with prompt engineering for GPT, and a few other AI-related APIs. I am looking to raise money for marketing aidev.codes. If anyone wants to hire me, see the email in my profile.
How different is the compensation between typical backend eng and backend ml eng? Not including designing the model itself as that seems to be the domain of phds it seems
I have yet to make the transition in a paid role, but I quit my backend job to start a startup developing realistic text-to-speech for long-form content.
My approach has been to start at a high-level, with a specific goal in mind, and to progressively go deeper and deeper. The specific goal part has been really helpful IMO. It prevents sort of aimless shuffling about and provides a good metric to see if you're making progress. When I started I was basically just focusing on producing training data and treating the models, which were open-source on GitHub, as a black-box. At this point I've made a lot of modifications to the actual model code itself and I'm learning a ton. There's of course a bunch of adjacent skills that are similar to traditional backend skills, but slightly different. Like autoscaling for example, there aren't many autoscaling solutions for GPU VMs yet, there are some startups working on this space, but IMO it's good to have a rock-solid hosting solution that you don't have to worry about too much.
I work at a large university affiliated research center and this is extremely common in my department. Our software devs have to be at least familiar with bleeding edge ML processes and several just in my group of about 60 have gone on to shift to more of an applied research role. We have a bunch of PhDs in math and CS, but no one cares what degree you have if you can produce.
I made the jump and pivoted a bit. My first gig was in enterprise applications and then I made the jump to mobile apps that I developed for 5 years.
I got somewhat lucky as the feature my team owned was powered by ML. After gaining credibility on the mobile side I worked with my manager to make the transition to backend. Did backend for about a year and was fortunate with the timing that my team was launching a new product with a model it owned. I got to work closely with ML engineers on it and eventually I became the DRI of the feature along with the model. After 2 more years I came to the realization that ML was moving a bit faster than I could keep up reading white papers about and decided to pivot to ML Ops. This let me leverage my strengths in distributed computing that I developed, be very close to ML without having to study math in my spare time
I sort of fell backwards into it about 5 years ago -- in the "Give it to Mike, he'll try anything" sense -- I inherited some mangled Jupyter notebook filled with health data, and nobody could make the model work out. Once I figured out the white paper and algorithm they were trying to rip off, it was easy enough to implement and host out as a PKL file with Fast API.
That's sort of what I've been doing since. It's much more interesting than solving botched up React Hooks, but there is about the same ratio of tedium:interesting work. I happen to like math, someone who does not like math... they're gonna go a little batty I think.
I haven't even raised my rates. I'm having enough fun with it.
So I think the answer you're after is either "Luck" or "Masochistic streak" ? :)
Anything and everything is AI/ML today. A few years ago we used to have a product which would keep track of inventory on display shelf (basically how many items are on shelf) and inventory in backroom and based on that math would generate an alert to the store owner if the number of items in backroom were lesser than a threshold. Basic math. That same product is being sold today without a single line of code change as an AI/ML product.
In short, all this AI/ML stuff is just buzzword, ultimately the work you will do in almost all these companies is regular run of the mill work nowhere related to ML or AI
This sounds like plain “false advertising“ to me. Besides — wouldn’t an inventory tracking platform that has actual real AI/ML capabilities end up eventually replacing its “unintelligent” equivalents since it’ll most likely be a better hence more valuable product?
I'm a full stack contractor. Been following generative AI since late last year. Picked up an AI startup gig in early January. Launched a couple weeks ago. Building out the client, API, server, and orchestrating the AI image generation pipeline plus Dreambooth. Choosing an AI GPU provider (or 3), solving prompt issues, figuring out model settings, making sure it can scale. A lot of the models are all comoditized on providers like Replicate which makes it like any other API based project. AI knowledge is still very useful to know what settings to use with the models though.
One concrete example we're hiring in: We're looking for a cleared security / SIEM engineer in Australia to help build out GPU/AI SOC tech. As long as they love Python and modern bits like scaling detection engineering & automation, they'll be learning the GPU & AI side pretty darn fast :)
you can do anything if you are able to convince your interviewer that hiring you will work out. I transitioned from web dev to data science by getting a VP of data science to work with me as lead engineer on their projects then studying in my free time and doing well on data science interviews to get a lead data scientist position. I went back to software engineering and now I am thinking about the same as you are. I will leverage my Data Science background, study, and do some ML projects. That should make me competitive in interviews and I expect to transition by the end of the year.
I am in the process of making this transition now.
I joined Grab.com on their Safety team and started working on their face recognition technologies. This got my feet wet in ML. Now I am leading their content moderation efforts.
TL;DR: Find an "ML adjacent" engineering role and take on ML/AI work.
"ML adjacent" roles could be, content moderation, safety, ads, and search.
I joined 2 years ago. I wish our stock price was better .
I’m happy with the work I do. The Company culture requires some thoughtful navigation. I’m also happy with the benefits (like travel and working from asian).
There are 2 levels to ML/AI, being a researcher and being an engineer. The researcher actually creates new models, architectures, etc. You're going to need to be talented at math, as well as pursue a PhD to have enough time to absorb some subset of the material to have a good understanding. (A masters was good but not enough time for me personally).
Then there is engineering which is leveraging the creations of the very smart PhDs. At least in my experience, the shallow level is basically fine-tuning models to your use case, which does require an understanding of some things like loss functions, train/validation/test sets, but it's not too complicated.
Everyone that asks me how to learn machine learning, I advise them to read Hands on Machine Learning by Aurélien Géron cover to cover. When I first started my masters I did this and it helped immensely because it was easy to understand, was broad, and was interested usually from an application perspective.
From there, I would suggest learning PyTorch (starting w/ Keras is ok too, but don't stay there too long, and avoid Tensorflow), as it's much easier to develop with. I always learn best with a personal project, so maybe see if there is a real life "problem" you'd be interested in solving, like classifying different pets from each other or something like that.
It'll take a while to build up your skills, so going to school is of course an option, but with dedication I think you can also accomplish this solely with side projects and learning on your own. Best of luck!