Hacker News new | past | comments | ask | show | jobs | submit login

In smaller teams/companies one gets to wear multiple hats. However, the term "Data engineer" was specifically created by/for ML folks to get rid of unpleasant repetitive work that has to be done but nobody looks forward to it.



Sorry, no. The term "ML scientist" was specifically created by/for data folks to get rid of unpleasant repetitive work with math equations that has to be done but nobody looks forward to it.

If you've ever crafted a pipeline and tuned it to hum along, then watched it break with new/more/messier data, then figured out creative ways to fix it or replace parts of it with more robust parts, iterating on that and scaling it up, you would know some of the fantastic pleasure of whatever you call it, data engineering.


Sounds like what the media entertainment companies call a "pipeline engineer". Hmmm...


> However, the term "Data engineer" was specifically created by/for ML folks to get rid of unpleasant repetitive work that has to be done but nobody looks forward to it.

This may indeed be how the term "data engineer" is used sometimes, but I have my doubts that it was originally created with this meaning. Not really sure where/when the term "data engineer" was actually created, but ICDE started in 1984 [1] and the Data Engineering Bulletin was renamed in 1987 [2] (from "Database Engineering"). It seems likely that the term "data engineer" has also been used since at least then.

Of course ML did also already exist then, but it's certainly a while before the current "big data" / "deep learning" time. And regarding the topics considered "data engineering" at that time, this is from the foreword of the December 1987 issue of the Data Engineering bulletin:

> The reasons for the recent surge of interest in the area of Databases and Logic go beyond the theoretical foundations that were explored by early work [...] and include the following three motivations:

> a) The projected future demand for Knowledge Management Systems. These will have to combine inference mechanisms from Logic with the efficient and secure management of large sets of information from Database Systems.

Which sounds just as relevant today as it did back then. It also does sound like a rather challenging task, and not exactly like "unpleasant repetitive work". Or at least not any more repetitive than: change some model parameters / retrain model / evaluate results / repeat ;)

[1]: https://ieeexplore.ieee.org/xpl/conhome/1000178/all-proceedi...

[2]: http://sites.computer.org/debull/bull_issues.html


Data engineering jobs named as such started to pop up only in the past few years, coinciding with Map Reduce/Spark availability. I wouldn't be surprised if it was re-introduced by one of the companies developing those systems to distinguish themselves (like Databricks, Cloudera etc.), a sort of a marketing. In the past we had DBAs, now DBA + DevOps + unspecified everything morphed into data engineering.

I used to be a member of SIGMOD and the "data engineering" you mentioned was just an academic term.


Data engineers exist at organisations without any ML work.


Yes, but they are basically what DBAs were before with the addition of ETL. OP is asking about data engineers in the context of ML.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: