Experienced practitioner here, the second half pf the post describes doing everything exactly the way I have done it (only differences are I picked C++ and Eigen instead of rust and nalgebrafor inference, and i used torch’s ndarray and backprop tools instead of jax’s- with the analagous “just print out a C++ code from python” approach to weight serialization). You picked up on the key insight which is that the size of the code needed to just directly implement the inference equations is much smaller than the size of the configuration file of any possible framework that was flexible enough to meet your requirements of (rust, no inference time allocation, no inference time floating point, trained from scratch, ultra small parameter count, …)