train a set of sklearn models one each per a random partition of the data (computed distributed). then combine all those models using averaging and evaluate them all against an even larger dataset. how do you do that in SQL
Sharding the table can help scale the problem across many machines and as I mentioned earlier you can use PL/R or PL/Python language extension to lift all sorts of ML functions to SQL functions.
ML models - I already mentioned how to uplift R and Python functions to SQL function. even if you are not using PostgreSQL many other databases help you with uplifting and interfacing with existing ML libraries through FFI