Inference on GPU is already very slow on the full-scale non-distilled model (in ...

		sailingparrot on Jan 18, 2021 \| parent \| context \| favorite \| on: GPT-Neo – Building a GPT-3-sized model, open sourc... Inference on GPU is already very slow on the full-scale non-distilled model (in the 1-2 sec range iirc), on CPU it would be an order of magnitude more.