Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Who needs MLflow when you have SQLite? (ploomber.io)
252 points by edublancas on Nov 16, 2022 | hide | past | favorite | 110 comments


I think MLflow is a good idea (very) badly executed. I would like to have a library that combines:

- simple logging of (simple) metrics during and after training

- simple logging of all arguments the model was created with

- simple logging of a textual representation of the model

- simple logging of general architecture details (number of parameters, regularisation hyperparameters, learning rate, number of epochs etc.)

- and of course checkpoints

- simple archiving of the model (and relevant data)

and all that without much (coding) overhead and only using a shared filesystem (!) And with an easy notebook integration. MLflow just has way to many unnecessary features and is unreliable and complicated. When it doesn't work it's so frustrating, it's also quite often super slow. But I always end up creating something like MLflow when working on an architecture for a long time.

EDIT: having written this...I fell like trying to write my own simple library after finishing the paper. A few ideas have already accumulated in my notes that would make my life easier.

EDIT2: I actually remember trying to use SQLite to manage my models! But the server I worked on was locked down and going through the process to get somebody to install me SQLite was just not worth it. It's also was not available on the cluster for big experiments, where it would be even more work to get it, so I gave up on the idea of trying SQLite.


Yep - totally agree. I respect the attempt to introduce something that is basically an opinionated CRUD app as a central place to put your model metadata. But it's not really ready for large-scale production, or for use by teams bigger than about 5.

It's kind of flaky and slow. It doesn't have namespacing. It's overly opinionated on the workflow (the way that states work, with a model version being in exactly one of dev, staging, prod is super hard to work with).

But beyond that, the biggest problem I have with MLFlow is what I call the "part of this complete breakfast" problem, which the ML/data-science arena is particularly susceptible to these days: the marketing talks a lot about what problems can be solved using the product, but not a lot about what parts of the problem the product actually solves. This is often because an honest answer to the latter question would be "not much". In the case of MLFlow, that would be totally fine, because honestly an opinionated CRUD app is a very useful thing. But it should be a lot more honest about what it does. It's not a system for automatically tracking model metrics, it's a database into which you can write model metrics with a known key structure.


Autologging literally does log your model metrics automatically in most cases.


Sqlite is in python’s stdlib, so how can this be an issue? Was there no local filesystem whatsoever?


sqlite bindings are in the stdlib but not the library itself.


im asking from ignorance, what the difference in effect this context of not having the library itself?


Using the bindings is only possible if the library itself is already installed (since the bindings directly make use of the library, under the hood).


I've never encountered a Python installation on any operating system where `import sqlite3` worked but the underlying libraries were not available.

I imagine this is because SQLite is VERY easy to bundle with Python itself. So on some platforms the OS SQLite is used, but on others it gets shipped as part of the Python installation itself.

It even works in WebAssembly via Pyodide!


+1 I also think it's faster that way on both environment setup and ad hoc rapid experiments, from my experience using the library in a team doesn't scale well, it becomes pretty slow.


I'm happy to collaborate with you, let's build the best experiment tracker out there! Feel free to ping me at eduardo@ploomber.io



have you used comet? it basically does everything you are asking and lot more user-friendly than MLFlow


Isn't Comet a proprietary SaaS? I like MLFlow because I can run it on my own computer if I want to.


Check out flyte and union.ml. No personal affiliation, just good projects in the vein of airflow/prefect/mlflow/kubeflow


I really like guild.ai. The best thing is that their developers assumed people to be lazy and automatically makes flag for global variables and track them.


@tomrod, thank you for the callout. By the way we are integrating mlflow into Flyte in a way that you do not need to start the web server to view the logs. They are available Locally and statically in Flyte Ui. Ofcourse you cal also use mlflow server


I'm having a fanboy moment. This is like tweeting about Mark Hamil or Ryan Reynolds and they tweet back.

Thanks for the great open source libraries yall!


> I think MLFlow is a good idea (very) badly executed.

Oh yes, I'm glad to see other with similar opinion.


The elephant in the room with data is that we don’t need a lot of the fancy and powerful technology. SQL against a relational database gets us extraordinarily far. Add some Python scripts where we need some imperative logic and glue code, and a sprinkle of CI/CD if we really want to professionalise the work of data scientists. I think this covers the vast majority of situations.

Despite being around it for some time, I’m not sure big data or machine learning needed to be a thing for the vast majority of businesses.


The article mentions this workflow:

"Let’s now execute the script multiple times, one per set of parameters, and store the results in the experiments.db SQLite database... After finishing executing the experiments, we can initialize our database (experiments.db) and explore the results."

Be warned that issuing queries while DML is in process can result in SQLITE_BUSY, and the default behavior is to abort the transaction, resulting in lost data.

Setting WAL mode for greater concurrency between a writer and reader(s) can lead to corruption if the IPC structures are not visible:

"To accelerate searching the WAL, SQLite creates a WAL index in shared memory. This improves the performance of read transactions, but the use of shared memory requires that all readers must be on the same machine [and OS instance]."

If the database will not be entirely left alone during DML, then the busy handler must be addressed.


None of these are a problem for the workload discussed.

When I am working with sqlite I am more likely accessing it from a single machine.

And in this case of ML, most likely from 1 process and by running multiple times in serial.


Unless your income is depending on carrying out the exact demands of some money guy that's most common phrase while using a computer is "it won't let me" and they want "big data".

Then you just suck it up and build one of the totally unnecessary big data systems that have been excreted all over the business world these days. I don't think the problem is that devs are over-engineering.

I wonder what its called, makes me think of tragedy of the commons but probably not quite right.


Hierarchy on bueracracies, by Jean Tirole. I know because this was the phenomenon I wanted to study in grad school only to find he scooped me (on this an several items) by several decades.

Edit: Tirole, Jean. "Hierarchies and bureaucracies: On the role of collusion in organizations." JL Econ. & Org. 2 (1986): 181.


if this research is so old, did world tried any thing to ameliorate this problem? I guess it doesn't happen yet...


36 years is not old in terms of research. 2,223 cites on Google Scholar and many in the past year. Seminal research often identifies the problem but not all solutions.


What gets me is how many companies paid through the nose to push their data into things like Hive and slowed down 99% of their queries to make one "run once a quarter" report run about 25% faster.

At least that was my experience a number of years back.


Maybe like 20 years ago you were right but today there's a generation that's been working for 10 years on systems built like that. They don't know any better, and in most cases nobody is around to teach them otherwise.


> SQL against a relational database gets us extraordinarily far.

I think it gets us all the way once you consider the ability to expose domain-specific functions to SQL that are serviced by your application code.

I've always been of the mindset that you can do anything with SQL if you are clever enough.


Yeah and even if you do need to do proper big-dataset-ML... a SQL box and maybe something like a blob storage for large artifacts (S3, Azure storage account, whatever) is all you need as well. But if your boss bought The MLOps Experience, you gotta do what the cool kids are doing!


I work in an environment where there are multiple tech teams developing models for multiple use cases on VMs and GPU clusters spread across our corporate intranet. Once you move beyond a single dev working on a model on their laptop, you absolutely need something that can handle not just metrics tracking, but making the model binaries available and providing a means to ensure reproducibility by the rest of the team. That's what MLFlow is providing for us. The API is a mess, but at least we didn't have to code up some bespoke in-house framework, we just put some engineers on task to play around with it for a few hours and figure out the nuances of basic interactions and deployed it.


Agree. Once you have a team, you need to have a service they can all interact with. This release is a first step, we want to get the user experience right for an individual and then think of how to expand that to teams. Ultimately, the two things we're the most excited about are 1) you don't need to add any extra code (and it works with all libraries, not a pre-defined set) 2) SQL as the query language


I don't get why a lot of people are calling mlflow a shitshow when it has done so much getting data scientist out of recording experiments via CSV. I can log models and parameters and use the UI to track different runs. After comparisons, I can use the registry to register different staging. If you have other model diagnostic charts you can log the artifact as well. I think mlflow v2 has auto logging included so why all the fuss?


People tend to forget that first movers rarely tend to also have the best design. MLFlow (and DVC) brought us out of the dark ages. Now we can build better tools, with the benefit of hindsight.

Claiming that something is "broken" or "trash" when you mean "I don't like it" is a good way to make yourself feel big and smart, but it's not actually constructive.


There are those who create and those who complain on the internet about tools they've used one time


Okay that's coming across as a pretty snide remark aimed at me, I'll bite.

Yes, I can understand why you comment that. I don't like blind slagging of free software either.

But there are ALSO those whose day job it is, and has been for the last 2 years, to use a badly designed overcomplex horrorshow of a tool that could be replaced easily by something better ... if it wasn't for the lock-in effects and strong marketing.

So I'm ventilating my frustration and at the same time expressing my gratitude to the person who made something fresh, that shows us things can be better.

I can't build the replacement to MLFlow myself, but I can cheer people on who do, and let them know their efforts are sorely needed.


You can also use mlflow locally with SQLite (https://www.mlflow.org/docs/latest/tracking.html#scenario-2-...). Even though I haven't tried querying the db directly ...


Could you provide context on why SQLite would replace MLflow? From the standpoint of model tracking (record and query experiments), projects (package code for reproducibility on any platform), deploy models in multiple environments, registry for storing and managing models, and now recipes (to simplify model creation and deployment), MLflow helps with the MLOps life cycle.


Fair point. MLflow has a lot of features to cover the end-to-end dev cycle. This SQLite tracker only covers the experiment tracking part.

We have another project to cover the orchestration/pipelines aspect: https://github.com/ploomber/ploomber and we have plans to work on the rest of features. For now, we're focusing on those two.


Have you looked into duckdb for the database? I'm hearing for some tasks it's faster than SQLite.


Yeah, we're also looking into it and we'll probably add it as a backend in the future!


I recently did the following:

- had a giant pcap

- wrote a perl script to output some of the key value from the dump (e.g. IP and UDP packet lengths) into csv

- loaded the csv into sqlite3 database

- ran several queries to identify microbursts of bandwidth etc

The younger/more junior folks were blown away that you could do this with <100 lines of code and it was pretty fast.

Btw, above was inspired by this: https://adamdrake.com/command-line-tools-can-be-235x-faster-...


If I were looking for bursts, SQL is not the first thing that comes to mind! Could you elaborate on this or sketch out the query?


Basically, doing a group by at millisecond resolution with a sum on the IP packet length to get a rough metric for bandwidth.

Once you have that, you can see the milliseconds with the highest bandwidth. Some extra math can also get you to Gigabits/second in a more network engineer friendly format.


I did a histogram-type thing in the same way by using a window function (similarly sqlite table scraped off pcap recordings). I can't remember if it was a fixed-width window (number of samples) or within some time window

Dropped it in datasette with datasette-vega and got a nice little plot


Aha, group by millisecond! Thanks.


Yeah, MLFlow is a shitshow. The docs seem designed to confuse, the API makes Pandas look good and the internal data model is badly designed and exposed, as the article says.

But, hordes of architects and managers who almost have a clue have been conditioned to want l and expect mlflow. And it's baked into databricks too, so for most purposes you'll be stuck with it.

Props to the author for daring to challenge the status quo.


"the API makes Pandas look good"

It sparks joy in my heart whenever I see shade cast against pandas.


I have never seen a worse documented library. Initially I thought that they were lazy, now I realize that it cannot be documented because it is a total mess of a library held together with tape.

Close second is the plotly library.


Genuinely curious what you have against the Pandas documentation. It has some of the best docstrings I've seen.

(I also wrote a Pandas book or two... So there's that)


Docstrings are one thing, but functionality discovery, picking up from scratch, troubleshooting, etc are... not fun, nor easy with the documentation. If you know it well already and use it a lot it's easier to forgive its documentation faults since you can waive off the problems as "that's just learning something new".

But for a lot of people who use it infrequently its documentation is a frustrating mess. Simple problems turn into significant time sinks of trying to find which page of the documentation to look at.

A lot of issues are made worse by shit-awful interop between libraries that claim to fully support dayaframes, but often fail in non-obvious ways... meaning back to the documentation mines.

I'd argue that because there's a market for a single author to write two books about it is indicative of documentation problems.


Fair enough. I'm highly biased and my recent book is the most popular Pandas book currently, so it is evidence that folks prefer opinionated documentation.

However, I always though the 10 minutes to Pandas page was decent for getting started. I picked up Polars recently and thought it was more difficult than Pandas because there wasn't any quick intro docs. What projects have great introductory docs for you?

Also, I am curious to learn more about the specifics of interop libraries you are referring to.

Learning a new tool is generally a challenge. I think another challenge with a lot of data tools is that non-programmers tend to be the major audience. I make my living teaching "non-programmers" how to use these tools.

That said, I always teach "go to the docstrings and stay in your environment (to not break flow) if you can." The pydata docstrings are better than most, including Python (the language).


Yeah, I think for your audience, pandas makes total sense! When I first started using it, it was through an ambitiously large project with tons of gaps in data, untype-able text for 1% of rows, didn't fit in memory.. etc. So my personal experience is a bit tainted by putting myself through a hell that could have solved sooner by spending more time learning instead of bashing my keyboard with a hammer.


I've long suspected that Pandas has taken a similar stance to e-mail scammers. Where e-mail scammers inject all kinds of broken english and bad punctuation to ensure they get their targets of choice, Pandas has broken and often inaccurate documentation in order to get only the chosen ones to work with their software.

However, maybe it makes more sense that it's just a mess that's hard to document.


Do you have a specific example of this broken documentation?


The Pandas documentation has improved quite a bit. Last I checked, the only part of the reference docs with a big gap was the description of "extension arrays" and accessors.

The user guide material absolutely needs work, and the examples in the reference docs tend to be a little contrived. But I absolutely have seen worse-documented libraries, such as Gunicorn and Pydantic.


I'm surprised to see Pydantic in here; I've used Pandas and Pydantic both quite a lot, and have found the Pydantic docs to be quite good! Also a much smaller library with a saner API, and thus easier to document well.


What makes the documentation so bad in your opinion? I’m not arguing but curious since I use pandas all day at my job and can’t think of any times the docs weren’t clear to me. (Plotly I have had some annoying times with!)


I think the R docs are the intended reference material for pandas ;)


What bothers me the most is the egregious data types for any argument. If it's a string, do this. If it's a list, do that. If it's a dictionary of lists, do this other thing.

No, I want you to force me to provide my data in the right way and raise a noisy exception if I don't.


Series and DataFrame have "alternate constructors" for this purpose, and the loc/iloc accessors give you a bit more control.

I agree that the magic type auto-detection is a bit too magical and sloppy, but you have to realize that data analysts and scientists have historically been incredibly sloppy programmers who wanted as much magic as possible. It's only in recent years that researchers have begun to value some amount of discipline in their research code.


Every time I open up pandas I jealously remember the expressive beauty of R for these tasks. But because we're all "serious" of course we must use Python for production lest we not be serious.


To be fair, taking R to production is a goddamn nightmare.


R is a trash of a language. It doesn't have any sense of coherency to it at all. They keep trying to fix the underlying problems by ducktaping paradigms on to it over and over (S3, S4, R6, etc). There's never a clear sense of the best way to do anything, but plenty of options to do a thing in a very hacky 'script-kiddy' way. Looking out at the community of different projects it becomes clear that everyone is pretty lost as to what design principles should be used for certain tasks, so every repo has its own way of doing things (I know personal style occurs in other languages, but commonalities are much less recognizable in R projects). It's tragic that such a large community uses it.


Trash language is a bit harsh. I'm not sure I would try to put an R project into production or build a huge project with it but, at the very least, R/R Studio was the best scientific calculator I've ever used. Was particularly great during college


It's not trash, it's functional, look:

    x <- 3


Yep, this is a mark of someone that's never used R but has heard a lot of incredibly ill informed criticism around it.

One look of dplyr code over pandas would of course disabuse anyone of the notion that R is trash and the tragedy is Python will in the current state never have anything like that. That's the advantage of the language being influenced by Lisp vs not.


I've heavily used R several times.

I agree that it is a trash language and that, outside that many frontier academic ideas are available and some plotting preferences are solidly prescriptive, it should be thrown into the trash bin.

Python, Julia when it gets its druthers for TTFP, Octave, Fortran, C, and eventually Rust. These are the tools I've found in use over and over and over again across business, government, and non-profits.

Everywhere R is used by the org I have seen major gaps in capacity to deliver specifically because R doesn't scale well.


Try to separate the language from its standard library. Neither one is "trash".

I agree that the standard library is what you might call "a chaotic disorganized mess".


I'm not emotionally invested in tools so am happy to identify the user experience and operational experience as "trash."

"Trash", despite its connotations of lacking value, is really just a chaotic disorganized mess of something made by artifice with dubious reclaim/reuse/recycle value. Being a subjective assessment, it is natural that one person's trash is a treasure to another.


I take issue with your implication that I'm emotionally invested in something when I shouldn't be. You are free to dislike R and not use it, but to claim that it's "trash" is to wrongly disavow its usefulness for the many people that do find it useful, and to cast aspersions on the judgement of all those people.


Hey, I apologize here, my point on emotional investment was that I, personally, am not emotionally invested in it and did not mean to cast aspersions at you for your defense of the language nor at people who have preferences for it. Specifically I meant that I'm comfortable enough in my understanding of the language to classify it and it's standard library as better in the garbage bin relative to alternatives available.

It's fine that people like it. What's good about it isn't unique, and what's unique about it isn't that great. And there are certainly switching costs for some orgs to consider.


What's wrong with pandas? Honest question. I'm a bit new to ML. Also, what's the alternative?


How many data scientists that use Databricks for modeling do you know?


its forced upon many of them that are in finance, banking, insurance, ...

Mainly because those tend to run on Microsoft Azure, which has no decent analytics offering, and are pushing Databricks extremely hard. The CTO or whatever just pushes databricks. On paper it checks all the boxes. Mlops, notebooks, experiment management. It just does all of those things very badly, but the exec doesn't care. They only care about the microsoft credits. Just to avoid using Jupyter so the compliance teams stay happy as well because Microsoft sales people scared them away from from open source.


My team very nearly had this happen to us.

We pushed back on it very, very, very hard, and finally convinced "IT" to not turn off our big Linux server running JupyterHub. We actually ended up using Databricks (PySpark, Delta Lake, hosted MLFlow) quite a bit for various purposes, and were happy to have it available.

But the thought of forcing us into it as our only computing platform was a spine-chilling nightmare. Something that only a person who has no idea what data analysts and data scientists actually do all day would decide to do.


What would you go with instead for collaborative notebooks?

I ask because normally I tend pretty strongly towards the "NO just let the DSes/analysts work how they want to", which in this case would be running Jupyter locally. However DBr's notebooks seem genuinely useful.

Is your issue "but I don't need Spark" or "i wanna code in a python project, not a notebook?", or something else?

Imo if DBr cut their wedding to Spark and provided a Python-only nb environment they'd have a killer offering on their hands.


> What would you go with instead for collaborative notebooks?

Production workloads should be code. In source control. Like everybody else.

Notebooks inevitably degrade into confusing, messy blocks of “maybe applicable, maybe not” text, old results and plots embedded in the file because nobody stripped them before committing and comments like “don’t run cells below here”.

They’re acceptable only as a prototyping and exploration tool. Unfortunately, whole “generation” of data scientists and engineers have been trained to basically only use notebooks.


It's ubiquitous. I've consulted for a 100 person company that built a data product on top of some IoT data. Everything was in databricks, literally everything. (Not endorsing that, just an observation)

Talking to a 2000+ person org now that is standardizing data science across the org using... you guessed it


Pretty interesting. I think this is part of this notion to release half baked products, like some of the stuff in there are really cool, just enough to get you in but it doesn't scale and usually is complex to deploy/use.


Where does the article say that?


About exposing the data inside MLFlow

> I found the query feature extremely limiting (if my experiments are stored in a SQL table, why not allow me to query them with SQL).


How about a side-by-side comparison?

Far too often, these articles of X is bad, use my homebrew Y instead, without showing comparison to X doesn't help illustrate 'why Y instead'.

You know... <cheeky>For science.</cheeky>


I think this is a neat solution for an engineer working on their own and wants to go back and look at the data from various experiments.

I don't see this scaling to many engineers working in a team, who would want to see each others experiment data, or even store artifacts like checkpoints and such. And lastly, in many cases ACLs are required as well when certain models trained with sensitive data shouldn't be shared with engineers outside of a team/group.


SQLite is literally a backend for MLflow, so the argument being made really is that you should just use SQL when you can, which is kind of adjacent to any criticisms of MLflow


Is querying the underlying SQL database officially supported in MLflow? Last time I used it, it wasn't documented. I took a look at the database and it wasn't end-user friendly.


As someone replied above, it's because SQL is just 1 backend and it's weird to expose an API that only works on 1 backend. Once you have many devs working together, you need a remote server. If you have a remote abstracted backend, it needs to have a unified API surface so the same client can talk to any backend. You might argue "This interface should be SQL", and to that I would say there are many file stores (like your local file system) that are not easy to control with SQL.


Not convinced by the example. I don’t see how you can’t use standard scikit-learn for it.

First, the example doesn’t take advantage of sklearn’s built in, super simple parallelization via n_jobs

Then, the entire example could be better wrapped with sklearn’s own cross_validate() which gives you the same functionality: a table of results across experiments.

If you use a different estimator, you can easily concatenate the results into a single df

The rest is the same.

Why you need SQLite for this? (SQLite is great of course for the right use cases)

And if you're doing many orders more experiments (1000s instead of 10s) then that’s probably where MLflow is good (haven’t actually used MLflow)


Wow this looks perfect for what I need right now - just a bit of lightweight tracking.


DVC also fills the "lightweight tracking" niche, although it relies on automatically creating Git branches as its technique for tracking experiments. I personally find that distasteful, so I don't use it specifically for experiment tracking, but the feature is there.

The company behind DVC is also building a handful of other related tools, e.g. https://iterative.ai/blog/iterative-studio-model-registry


It doesn't require creating a branch when you iterate, it requires creating a branch or commit if you want to share it with the team - see it on GitHub or in Studio. But even those lightweight iterations (https://dvc.org/doc/command-reference/exp/run) could shared as well via Git server - they won't be visible for now via UI in GH/Studio at the moment.

Happy to provide more details on how it's done. It's actually quite interesting technical thing - custom Git namespace https://iterative.ai/blog/experiment-refs


Hm, in what way do you find that DVC requires creating new branches for experiment tracking?

I find the following workflow works well, for example:

1. Define steps depending on a `config.yml`.

2. Run an initial experiment (with an initial config) and commit the results.

3. Update config (preserving the alternate config and using symlinks from `config.yml` to various new configs if necessary), re-run, and commit.

4. Results are then all preserved in your git history.


Right, but experiments aren't always linear. Do you really want to make a new commit for every iteration of a hyperparameter search? What if you are using a black-box optimizer that supports parallel/concurrent updates?

I don't want to use Git to track all that. I want to use Git to store the final results of running such an experiment in the same commit as the code that implemented it. I just don't like the DVC experiment workflow, but I am more than happy to use DVC for storing the fitted model(s) at the end of the run.


Yeah, that's a fair point, and I agree. I don't think it's ideal.


If you use `dvc exp run` you don't need to commit anything as I mentioned above. You can run multiple experiments in parallel, etc. Commit happens only / when you want to select the best result and share it with the team. But even that is optional.


If you need help, you can open an issue on GitHub (https://github.com/ploomber/ploomber-engine) or join our Slack! (https://ploomber.io/community/)


what is alternative to MLflow other than SQLite, like Kubeflow, Metaflow?


I highly recommend ClearML for effortless experiment that just works. It does a lot more of MLOps besides experiment tracking but I haven’t used those functionalities

https://clear.ml/

I had researched and spent time with several other tools including DVC, GuildAI and MLFlow but finally settled on ClearML. WandB pricing is too aggressive for my liking (they force an annual subscription of $600 last I checked)


There are a lot of tools in this space. Shameless plug to follow.

I helped build and use Disdat, which is a simple data versioning tool. It notably doesn't have the metadata capture libraries MLFlow has for different model libs, but it's meant to a lower-layer on which that can be built. Thus you won't see particulars about tracking "models" or "experiments", because models/experiments/features/intermediates are all just data thingies (or bundles in Disdat parlance). For the last 2+ years we've used Disdat to track runs and outputs of a custom distributed planning tool, and used Disdat-Luigi (an integration of Disdat with Luigi to automatically consume/produce versioned data) to manage model training and prediction pipelines (some with 10ks of artifacts). https://disdat.gitbook.io/disdat-documentation


Weights and Balances https://wandb.ai/site


Weights and *Biases :)


Checkout Flyte.org and it’s sibling project https://www.union.ai/unionml





I mean, come on, SQLite doesn't even support concurrency. Are people seriously considering using it in a production scenario?

If you work in a DS team where you're the only DS, then it probably suits your needs. Otherwise I can't imagine how you could achieve anything production grade


Being able to use SQL for later analysis is definitely a good idea. For smaller models SQLite for sure is enough but as soon as you want to scale your HPO across multiple servers or even just processes, you will need something that supports a multi-user database. E.g. Optuna supports PostgreSQL and also defaults to SQLite as far as I know.


I've found weights and biases extremely easy to use with minimal integration effort. Not sure if this provides any functionality that's better.


As noted in an earlier comment, I think there is a false equivalence between end-to-end MLOps platforms like MLflow and tools for experiment tracking. The project looks like a solid tracking solution for individual data scientists, but it is not designed for collaboration among teams or organizations.

> There were a few things I didn’t like: it seemed too much to have to start a web server to look at my experiments, and I found the query feature extremely limiting (if my experiments are stored in a SQL table, why not allow me to query them with SQL).

While a relational database (like sqlite) can store hyperparameters and metrics, it cannot scale for the many aspects of experiment tracking for a team/organization, from visual inspection of model performance results to sharing models to lineage tracking from experimentation to production. As noted in the article, you need a GUI on top of a SQL database to make meaningful model experimentation. The MLflow web service allows you to scale across your teams/organizations with interactive visualizations, built-in search & ranking, shareable snapshots, etc. You can run it across a variety of production-grade relational dBs so users can query the data directly through the SQL database or through a UI that makes it easier to search for those not interested in using SQL.

> I also found comparing the experiments limited. I rarely have a project where a single (or a couple of) metric(s) is enough to evaluate a model. It’s mostly a combination of metrics and evaluation plots that I need to look at to assess a model. Furthermore, the numbers/plots themselves have no value in isolation; I need to benchmark them against a base model, and doing model comparisons at this level was pretty slow from the GUI.

The MLflow UI allows you to compare thousands of models from the same page in tabular or graphical format. It renders the performance-related artifacts associated with a model, including feature importance graphs, ROC & precision-recall curves, and any additional information that can be expressed in image, CSV, HTML, or PDF format.

> If you look at the script’s source code, you’ll see that there are no extra imports or calls to log the experiments, it’s a vanilla Python script.

MLflow already provides low-code solutions for MLOps, including autologging. After running a single line of code - mlflow.autolog() - every model you train across the most prominent ML frameworks, including but not limited to scikit-learn, XGBoost, TensorFlow & Keras, PySpark, LightGBM, and statsmodels is automatically tracked with MLflow, including all relevant hyperparameters, performance metrics, model files, software dependencies, etc. All of this information is made immediately available in the MLflow UI.

Addendum: As noted, there is a false equivalence between an end-to-end MLOps lifecycle platform like MLflow and tools for experiment tracking. To succeed with end-to-end MLOps, teams/organizations also need projects to package code for reproducibility on any platform across many different package versions, deploy models in multiple environments, and a registry to store and manage these models - all of which is provided by MLflow.

It is battle-tested with hundreds of developers and thousands of organizations using widely-adopted open source standards. I encourage you to chime in on the MLflow GitHub on any issues and PRs, too!


+1. I'd also like to note that it's very easy to get started with MLflow; our quickstart walks you through the process of installing the library, logging runs, and viewing the UI: https://mlflow.org/docs/latest/quickstart.html.

We'd love to work with the author to make MLflow Tracking an even better experiment tracking tool and immediately benefit thousands of organizations and users on the platform. MLflow is the largest open source MLOps platform with over 500 external contributors actively developing the project and a maintainer group dedicated to making sure your contributions & improvements are merged quickly.


What about BentoML?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: