Pharmacometrics is built around small stiff ODE models. You might think that this might mean it's "simple" computationally, but what it actually means is that every little detail matters a lot for performance. The way you do lu-factorization for 4 ODEs vs 6 ODEs needs to change in an architecture-dependent fashion, and BLAS libraries are not efficient in this range. Standard floating point exponentiation can be too expensive in some spots, while at the same time the standard stiff ODE solvers make assumptions about regularity which are regularly violated by pharmacometric models (due to how dosing works). There is more than a little bit of "by-hand SIMD" in this software stack. Specializing on all of these features is an interplay between algorithms and JIT compilation, where not utilizing statically compiled optimizations will hurt you at this size. We see this in benchmarks like the SciML Hires [1] which is able to outperform the classic Fortran libraries like LSODA with newly developed Rodas methods. As a less direct and more illustrative example, these benchmarks vs PyTorch [2] demonstrate what happens when you compare against an optimizing JIT compiler which doesn't specialize on small sized interactions (>30x performance difference!), showing how this kind of application is very much not in the regime of large kernels which a lot of recent compiler work tends to optimize for and instead heavily relies on being able to cross-compile and inline computations to remove every little overhead, all while improving the ODE algorithms themselves. We will have a paper that goes into more detail on this fairly soon.
Like any big computation, it's lots of simple stuff smashed together, so if the simple stuff is slightly slow it can cost a lot. Nonlinear mixed effects model fitting with a few thousand patients (like in a later stage clinical trial) can take 2 weeks to one month to run. Being slow at this stage of the computation can delay clinical trials. Because of the extreme cost of these trials, people very much care if it's too slow.