The GPU buffering model doesn't usually let you read the previous frame like that these days, so you would need a scratch surface. That increases your bandwidth and memory requirements, so shifting doesn't end up being much faster if it's faster at all.