Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Transform ML models into native code with zero dependencies (github.com/bayeswitnesses)
184 points by ghosthamlet on March 5, 2019 | hide | past | favorite | 26 comments


Other similar projects:

https://github.com/nok/sklearn-porter Supports many scikit-learn models to Java/C/JavaScript/Go/Ruby, at least since 2016.

https://github.com/konstantint/SKompiler transpiles to Excel/SQL

https://github.com/jonnor/emlearn To C only, focus on microcontrollers/embedded devices. Includes feature extraction tools also. Disclaimer: I wrote it.


Thanks for sharing this! Looking at sklearn-porter now, hope I can contribute to the either the Ruby, Golang, or PHP library.


Similar: tfcompile AOT compiles TensorFlow models into native code using XLA (https://www.tensorflow.org/xla/tfcompile).


Does anyone know of a similar idea for neural networks? As far as I can tell, you need the entire framework (which requires a heavy, 1.5 GB docker image) to apply a trained model, even though in theory you only need matrix multiplication and a few activation functions.

Related: https://onnx.ai/


I wrote keras2cpp https://github.com/pplonski/keras2cpp

It transforms keras + theano models into pure C++, no additional packages. It is not using GPU.


A similar idea but for neural networks is by using TVM [1]. It's built on top of Halide and LLVM. I haven't tried it myself, but it seems to support deployment to many backends.

[1] https://docs.tvm.ai/deploy/index.html


Try Yolo: https://pjreddie.com/darknet/yolo/ . It's written in C.


There’s a project that compiles ONNX models: https://onnc.ai


There are similar libraries for converting ML models into SQL queries.

However, the important part of most models is not the `estimator.fit(X, y)` line, but all the things that are done to X before fitting or estimating.


Hard to say one part is more important than the other. A working model that does its job in code is incredibly useful.


You might consider supporting sklearn.preprocessing objects as well to replicate any transformations applied to training data.


Hey folks! One of the authors here. Somehow this post went under our radar. Thank you all for your comments and feedback! We're super excited about the amount of attention that this project has gotten. This motivates us to work even harder and to deliver even more cool stuff. Go and JS support are definitely on our agenda. Shouldn't be too hard to add since we first transform models into AST and only then interpret AST into a specific language. I might be wrong but this is a very crucial detail that distinguishes m2cgen from similar projects like sklearn-porter. As a result models are completely decoupled from languages and can be worked on independently. Once you implement a particular model, all languages get its support automatically. And vice versa. Plus we support XGBoost and LightGBM :) SVM support is something we'll be focusing on from the modeling side of things. Any contributions are much appreciated!


One big improvement for this project would be to somehow break out the model weights into an additional dependency, or file, to allow for very large models and separation between "code" and "data. Overall, pretty good, the alternative approach is either using the underlying C++ lib, or just doing some matrix math!


The package sklearn-porter supports the separation between inference (code) and model data (parameters) by passing `export_data=True` while transpiling the trained estimator.


Thanks, I'm coming over from R and will certainly check that out!


Now do it for tensorflow. Let's say I want to generate AI faces on a small ARM processor.


Tensorflow is working on tflite, which runs on small ARM processors (with runtime). And tfcompile is their compile-to-native approach (no runtime).


This is pretty cool idea! Would there be any possibility of trying to convert a model to javascript/node?

Haven't looked through the source but it looks like the generated code is a essentially the weights from a trained model transformed into a function for the target language.


If your model framework supports it, you can export it to ONNX and there’s JS frameworks that support serving ONNX models.


Thanks for sharing. This is quite interesting as we’ve been mulling over library support for our platform.


What are the advantages?


Trained ML models that run as native code with zero dependencies.


Note though that e.g. liblinear models are trivial to load and apply yourself - most classifiers just compute the matrix-vector product between the weight matrix and an instance vector and take the class with the highest activation. That route has the benefit that you do not hardcode a model, but can easily load new models.

Not to criticize this project. This looks nice and has many useful applications.


Heck, you could even put it into a lambda function on a lambda service and deploy in like 5 min! That is a lot of value.


Yes, you got to the heart of it. *

* owner and main contributor of sklearn-porter


How does the runtime performance compare? (mem/cpu)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: