I'm suspecting it could be similar to the learned vs. predefined positional embe...

eurekin on Jan 12, 2024 | parent | context | favorite | on: Edge Detection for Image Processing

I'm suspecting it could be similar to the learned vs. predefined positional embeddings in GPTs. That is, the learned version is a "warped and distorted" version of the exact predefined pattern, and yet somehow it performs a bit better, and no one knows exactly why.