The biggest model that they have used has only 760M parameters, and it outperfor...

		tigershark 7 months ago \| parent \| context \| favorite \| on: Titans: Learning to Memorize at Test Time The biggest model that they have used has only 760M parameters, and it outperforms models 1 order of magnitude larger.

Gah dmn