BPF was used to do the aggregation and calculation in-kernel. You still need ftrace to actually run the BPF program in that context. You can read the cover page for the patch that added this in 2015[1].
Right; plus theres some capabilities where ftrace is (and maybe always will be) better. Eg, function counting: ftrace can count all kernel functions instantly (try my perf-tools funccount tool), whereas the BPF method involves setting a kprobe on everything, which takes much longer (setup and tear down). And function graph tracing from ftrace will likely be better than anything we can do in BPF (as it uses tracing all functions as well).