Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This post really resonates with why we created Orchest [0]

From the article: "involve two full sets of tools: one for the dev environment, and another for the prod environment"

This is what we think should change. We intend to bring dev and prod into a single cohesive environment. Initially it will be difficult to cover all types of production workloads (like the post mentioned, production is a spectrum). But what we've observed is that through container encapsulation we can create well defined production workloads that we can run on any container orchestrator while shielding the data scientists from that complexity during pipeline development _and_ deployment.

With a container first approach to DAGs it becomes trivial not just to mix library versions but even languages (e.g. feature extraction in Scala and model fitting in Python). In practice, this flexibility has resulted in a significant productivity increase because existing code "just works". No "one virtual environment to rule them all" necessary.

I like how the article does justice to the fact that there's a subtle yet important difference between mere workflow orchestrators and workflow orchestrators that take on meaningful responsibility when it comes to infrastructure. To really unburden the data scientist from having to be a full-stack unicorn you need to hide the underlying stack to the point where it's invisible. In that sense, the OS kernel analogy really works. Similarly, how many data analysts writing SQL have ever worried about database node sharding?

A big problem we see in the space is that there are still way too many leaky abstractions and data scientists end up dealing with architecture & config yet again, for many a task out of their depth. We hope to contribute to a better ecosystem, one where data scientists spend their time looking at the data, relating it to the domain, shipping value generating data pipelines/models, and communicating about results with their stakeholders. Not fighting config & infra.

[0] https://github.com/orchest/orchest



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: