I was using the pytorch 0.1.12 installed with conda (following their USAGE.md) and it took ~30s total for the transfer.
For some reason it's taking me about 4-5 minutes for the transfer, but the code now runs and the rest of the runtime is only a few seconds.