They could always release figures for larger networks - they don't have to targe... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		typon on Jan 3, 2020 \| parent \| context \| favorite \| on: Cerebras’s giant chip will smash deep learning’s s... They could always release figures for larger networks - they don't have to target Resnet50 (which is the MLPerf standard). I don't think anyone would hold it against them if they show massive improvements in something like GPT-2 training time (a network 37000x the size of Resnet)

Veedrac on Jan 3, 2020 [–]

GPT-2 uses attention, which is very memory hungry to train, so probably won't work well. But I agree with your overall point.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact