As noted in an earlier comment, I think there is a false equivalence between end...

As noted in an earlier comment, I think there is a false equivalence between end-to-end MLOps platforms like MLflow and tools for experiment tracking. The project looks like a solid tracking solution for individual data scientists, but it is not designed for collaboration among teams or organizations.

> There were a few things I didn’t like: it seemed too much to have to start a web server to look at my experiments, and I found the query feature extremely limiting (if my experiments are stored in a SQL table, why not allow me to query them with SQL).

While a relational database (like sqlite) can store hyperparameters and metrics, it cannot scale for the many aspects of experiment tracking for a team/organization, from visual inspection of model performance results to sharing models to lineage tracking from experimentation to production. As noted in the article, you need a GUI on top of a SQL database to make meaningful model experimentation. The MLflow web service allows you to scale across your teams/organizations with interactive visualizations, built-in search & ranking, shareable snapshots, etc. You can run it across a variety of production-grade relational dBs so users can query the data directly through the SQL database or through a UI that makes it easier to search for those not interested in using SQL.

> I also found comparing the experiments limited. I rarely have a project where a single (or a couple of) metric(s) is enough to evaluate a model. It’s mostly a combination of metrics and evaluation plots that I need to look at to assess a model. Furthermore, the numbers/plots themselves have no value in isolation; I need to benchmark them against a base model, and doing model comparisons at this level was pretty slow from the GUI.

The MLflow UI allows you to compare thousands of models from the same page in tabular or graphical format. It renders the performance-related artifacts associated with a model, including feature importance graphs, ROC & precision-recall curves, and any additional information that can be expressed in image, CSV, HTML, or PDF format.

> If you look at the script’s source code, you’ll see that there are no extra imports or calls to log the experiments, it’s a vanilla Python script.

MLflow already provides low-code solutions for MLOps, including autologging. After running a single line of code - mlflow.autolog() - every model you train across the most prominent ML frameworks, including but not limited to scikit-learn, XGBoost, TensorFlow & Keras, PySpark, LightGBM, and statsmodels is automatically tracked with MLflow, including all relevant hyperparameters, performance metrics, model files, software dependencies, etc. All of this information is made immediately available in the MLflow UI.

Addendum: As noted, there is a false equivalence between an end-to-end MLOps lifecycle platform like MLflow and tools for experiment tracking. To succeed with end-to-end MLOps, teams/organizations also need projects to package code for reproducibility on any platform across many different package versions, deploy models in multiple environments, and a registry to store and manage these models - all of which is provided by MLflow.

It is battle-tested with hundreds of developers and thousands of organizations using widely-adopted open source standards. I encourage you to chime in on the MLflow GitHub on any issues and PRs, too!