It depends on where the timing code is. If the timer starts after all the data has already been loaded, the time recorded will be lower (even if the total time for the whole process is higher).
I’m not following how that would result in a 10x discrepancy. The amount of data we’re talking about here is laughably small (it’s like 32 bytes or something)
I’ll admit to not having looked at the details at all, but a possible explanation is that almost all the time is spent on inter process communication overhead, so if that also happens before the timer starts (eg, the data has been transferred, just waiting to be read from a local buffer) then the measured time will be significantly lower.