Hacker News new | past | comments | ask | show | jobs | submit login

Someone who can engineer infrastructure, pipelines and fire fight production issues is hard to find, but that’s not the point I was making.

My apologies; It is real work; the point I was making is it’s not ML work, any more than writing a yaml file is ML work.

If you want to write yaml files, any number of possibilities exist.

If you want to work with machine learning, then don’t become a data engineer. The skills are, mostly, not related ML, and more closely aligned with SRE / devops.

It’s not infrastructure and helping build models as you mature and advance: it’s almost literally just infrastructure and fire fighting… in my, limited, 3 years of experience as such.




What?

A lot of the hard part isn't the model, and especially in a world where bert, xgboost, optuna, pytorch, etc have solved much of the classic problem and forced 'real' DS to specialize on either the business consulting side (not math/engineering) or theory side (barely implemented). The rebrand of 'data analyst' (SQL, powerbi, . ..) to 'data scientist' by even top tech companies underscores this. It's not yet to where web dev has gotten in terms of global $20/hr fiverrr contractors, but already at say $40/hr for someone who can build real production models for more boring scenarios.

The result is the vast bulk of data scientists (phd, self-trained, consulting, ...) we interview are weak engineers, so going from a make-believe notebook to a trickier production scenario requires the data engineer / MLOps / etc to solve a lot that a typical DS doesn't really understand in practice. Scale, latency, distributed systems, testing, etc. Likewise, the part the DS solves has little to do with the latest neuroips paper, and more just about lifecycle tasks like getting better data, which the other folks on the team will often be involved with as well.

So 2 natural high-paying paths here:

data engineer / MLOps -> MLEngineer -> DS

data engineer -> all-in-one data analyst/scientist -> ML/AI data scientist


I agree with this. From my experience most of the data scientists I have worked with didn't exit the world of Jupyter notebooks. For them, code management, CI/CD, dev/stage/prod separation, etc. is a world of its own that they are not very comfortable with. Heck, they even used Sagemaker to create git repo for their Jupyter notebooks.

It doesn't mean that there aren't data scientists who have some engineering experience as well, but this seems to be rare. For that reason, getting those ML models that they painstakingly build to where they'll generate some real value is super hard. They just don't know where to start. Working across multiple teams and multiple functions is very challenging and it often creates friction. Therefore, creating tools and systems that will enable those data scientists to see the actual value of their labor is paramount.

That's why we're seeing a huge resurgence of so called MLOps tools and platforms that aim to solve all or some of the problems of the entire stack. We are very very early in this journey, but I believe 2020's will be for ML and AI what 2010's were for the cloud and data, ie. new Snowflakes and Databricks but for the actual ML apps. It's exciting.


Definitely agree with your first two paragraphs, but am confused by the pay paths. Can you expand on what the paths mean?


It's useful to work backwards from the knowledge a DS needs to be worth their weight. Imagine a small team of $400K/yr DS + $400K/yr DE + ... and whatever hw/sw . So say a $2-3M/yr project driving $3M+ of new growing revenue or $6-12M of annual savings. At bigger companies, even more magnitudes & pressure :)

The DS will likely:

- be close to the business case & business stakeholders to ask questions a normal lead can't

- know the relevant math + ML algorithms, and build up specializations pairing DS niches ("time series forecasting") with industry niches ("supply chains in manufacturing")

- enough engineering & performance understanding to work with a DE on going from small data sets to big ones

- have an intuitive feel for all of the above - how data/usecases/etc. go right/wrong

That's a lot!!

One path is jumping in as a low-paid intern or new grad and doing your time. But a pivot is different, esp. to get paid along the way. Most CS grads had little math ("intros to stats, combinatorics, & algs; dropped linear algebra"), weak ML ("did algs; intro to ML only covered kmeans & bayes; tried running a BERT model on some data"), and little intuition for how ML typically goes wrong ("what's class imbalance?"). So if they do get hired directly as a mid-level DS, it's probably on a team of the blind-leading-the-blind. Oops.

BUT SQL/Spark/K8S/pandas/regex are real skills. Doing the data engineering, ML operations, etc., around making an ML pipeline more than a fanciful notebook that wouldn't last a minute in production is real work. That stuff does pay well, and by working with the ML folks, you'd naturally get pulled into the ML tasks as well. DS write all sorts of bugs that surface as production evolves and the full team works together on, and new features that needs a team to make real. So taking a job that mixes engineering specialties with ML specialties is a smoother pivot path for the typical CS backgrounds I've seen. Over time, drift to more ML-y aspects of the projects happening until you can do the full hop. (Nit: That won't teach the math & deeper intuition, so I'd still do courses + projects on the side.)


In general, does the DE have higher salary than DS?

Am I understood correctly that there is much more demand for DE than for DS?


I wish I had real numbers. So instinct from what I've seen:

- a data analyst role rebranded as a DS role will be lower paid than a DE role, maybe 50% diff

- an actual DS role is probably higher paid than a DE role, but really depends on the job+co

- a great DS role and a great DE role are both super well compensated. Though maybe again DS higher than DE in most just b/c ability to more directly drive $. Unless something like an infra company, the DS will be inherently closer to the business & outcomes. ("I did this clever thing that netted 2% revenue spike that adds up to $40M/yr in new revenue, what did you do?")


NeurIPS paper, not neuroips paper


still not used to the new name ;-)


With the right mindset it can be insanely fun building infrastructure, automating things, and engineering solutions such that improvements ratchet forward and fires get put out before they have a chance to grow. While the ML people, bless them, are chasing an 0.001 improvement in the metric of the day, data engineers can be having huge impact and changing the game. Meantime ML is becoming commoditized in its most common use cases.


I first had the DE title 7 years ago (going into it having never heard of DE), and have been doing MLE/platform work for the past 5. You’re projecting your limited experience onto a poorly defined role that varies wildly from company to company. My experience is much different from yours: little firefighting, lots of actual building. Yes there is infrastructure, but any good programmer these days should be able to stand up some basic infrastructure.

Yes, don’t get into it if you want to do ML research or apply ML, but if you are interested a bit in it and find building models the least creative, most boring shit ever like I do, and prefer traditional coding, it’s a nice spot to be in.


What is the average salary for DE currently in US?


Great comments. I agree with your take on what being an ML Eng actually means. Of course this will vary to a degree from team to team and company to company, but I think you still capture it well.

I absolutely think MLEng is important and much needed, but too often under appreciated. Being this half breed part engineer part ML leaves you on a lonely island often in many orgs. The ML managers don't really understand what you do and neither do the engineering managers. It is kind of thankless unless your management really understands your role and appropriately advocates for you.

MLEng is often an engineer who wanted to get into the sexy ML space and since it is in the title it feels cool. Then you realize you're more an Ops engineer who deals with the inane code of many "true" DS/ML scientists. Thankless, indeed.


Especially in the edge / embedded space, MLEng will imply more than just doing ops.

Stuff to do could include: - Getting a network architecture to run. - Applying optimization depending on target arch (pruning, quantisation, custom cuda kernels, etc). - Integrating models (rule of thumb: a product is 95% ordinary code, 5% is ML related). - Constructing benchmarks, monitoring


Sure - fair clarification. But conversely, you could make some awesome automation that you rule with a light touch as an engineer, or as a DS you could come up with that ML that Amazon has to recommend you endless TVs as soon as you bought a TV :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: